Network Working Group                  Richard Price, Siemens/Roke Manor
INTERNET-DRAFT                                      Hans Hannu, Ericsson
Expires: August 2002                     Carsten Bormann, TZI/Uni Bremen
                                           Jan Christoffersson, Ericsson
                                                      Zhigang Liu, Nokia
                                         Jonathan Rosenberg, dynamicsoft

                                                       February 14, 2002


                      Signaling Compression (SigComp)
                     <draft-ietf-rohc-sigcomp-04.txt>


Status of this memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/lid-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This document is a submission of the IETF ROHC WG. Comments should be
   directed to its mailing list, rohc@cdt.luth.se.


Abstract

   This document defines SigComp, a solution for compressing messages
   generated by text-based protocols such as SIP [SIP] and RTSP [RTSP].
   The architecture and pre-requisites of SigComp are outlined, along
   with the format of the SigComp message.

   Decompression functionality for the SigComp solution is provided by a
   "Universal Decompressor Virtual Machine" optimized for the task of
   running decompression algorithms. The UDVM can be configured to
   understand the output of many well-known compressors such as
   [DEFLATE].




Price, Hannu, et al.                                            [Page 1]


INTERNET-DRAFT                  SigComp              February 14 , 2002


Table of contents

   1.  Introduction..................................................2
   2.  Terminology...................................................3
   3.  SigComp Architecture..........................................5
   4.  SigComp message flow..........................................11
   5.  SigComp compressor............................................15
   6.  State handling and capability announcement....................18
   7.  Overview of the UDVM..........................................22
   8.  Decompressing a SigComp message...............................26
   9.  UDVM instruction set..........................................30
   10. Security considerations.......................................41
   11. IANA considerations...........................................43
   12. Acknowledgements..............................................43
   13. AuthorsÆ addresses............................................44
   14. References....................................................45
   Appendix A. Mnemonic language.....................................46
   Appendix B. Example application-defined parameters................48
   Appendix C. Example decompression algorithms......................49
   Appendix D. Document history......................................51

1.  Introduction

   The Session Initiation Protocol (SIP) [SIP], along with many other
   application protocols used for multimedia communications such as RTSP
   [RTSP], is a textual protocol engineered for bandwidth rich links. As
   a result, the SIP messages have not been optimized in terms of size.
   Typical SIP messages are from a few hundred bytes to as high as 2000.
   To date, this has not been a significant problem.

   With the planned usage of these protocols in wireless handsets as
   part of 2.5G and 3G cellular networks, the large size of these
   messages is problematic. With low-rate IP connectivity, store-and-
   forward delays are significant. Taking into account retransmits, and
   the multiplicity of messages that are required in some flows, call
   setup and feature invocation are adversely affected. Therefore, we
   believe there is merit in reducing these message sizes.

   This document outlines the architecture and pre-requisites of the
   SigComp solution including the capability announcement and UDVM
   algorithm upload, along with the format of the SigComp message.

   SigComp is typically offered to applications as a "shim" layer
   between the application and the transport. The service provided is
   that of the underlying transport plus compression.

   This document focuses on the signaling scenario where an endpoint
   sends and receives data to/from an outbound/inbound proxy. However,
   SigComp is designed to run over both connectionless and connection-
   oriented transports and hence may be applicable to other scenarios
   with multiple endpoints compressing and decompressing data.




Price, Hannu, et al.                                            [Page 2]


INTERNET-DRAFT                  SigComp              February 14 , 2002


2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC-2119].

   SigComp

     The overall solution for signaling compression, comprising the
     compressor, decompressor, dispatchers and state handler.

   Application

     For the purpose of this document, an application is a text-based
     protocol software that:

     a) sends application data to the compressor dispatcher
     b) receives data from the decompressor dispatcher
     c) decides whether state information may be saved by SigComp

   Transport

     Mechanism for passing data between two instances of an application.
     SigComp is capable of sending messages over a wide range of
     transports including TCP, UDP and [SCTP].

   Message-based transport

     A transport that carries data as a set of distinct, bounded
     messages.

   Stream-based transport

     A transport that carries data as a continuous stream with no
     message boundaries. In this case, SigComp reserves the character
     0xFFFF to delimit messages in the compressed stream.

   Application-defined parameters

     Parameters that must be agreed upon by the applications invoking
     SigComp. Depending on the situation these parameters might be fixed
     a-priori or negotiated.

   Application message

     An uncompressed message, as provided from or to the application,
     which is to be compressed by the compressor. When delivered
     from the decompressor the data has passed through the decompression
     process and is referred to as decompressed data or a decompressed
     message.





Price, Hannu, et al.                                            [Page 3]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   SigComp message

     May contain a compressed application message in the form of UDVM
     bytecode. In case of a message- based transport, such as UDP, a
     SigComp message corresponds to exactly one (UDP) datagram. For a
     stream-based transport, such as TCP, each SigComp message is
     separated by a 0xFFFF delimiter.

   Compressor

     The compressor invokes the encoder, and keeps track of states that
     can be used for compression. It is responsible for supplying UDVM
     bytecode to the remote decompressor in order for compressed
     data to be decompressed.

   Encoder

     Encodes data according to a (compression) algorithm into UDVM
     bytecode. The encoded data can be decoded by a UDVM.

   Compressor dispatcher

     A layer that receives uncompressed application messages, invokes a
     compressor, and forwards the SigComp messages to a remote SigComp
     layer.

   Decompressor

     The decompressor is responsible for converting a SigComp message
     into uncompressed data. Decompression functionality is provided by
     the UDVM.

   Decompressor dispatcher

     A layer that receives SigComp messages, invokes a decompressor, and
     forwards the decompressed application messages to an application.

   Virtual machine

     A machine architecture designed to be implemented in software
     (although silicon implementations are of course possible).

   Universal Decompressor Virtual Machine (UDVM)

     The virtual machine described in this document. The UDVM is used
     for decompression of SigComp messages.

   Bytecode

     Machine code that can be executed by a virtual machine. UDVM
     bytecode is a combination of UDVM instructions and compressed data.




Price, Hannu, et al.                                            [Page 4]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   Per-message compression

     Compression that does not reference data from previous messages.
     SigComp can decompress a message of this type using only the
     application-defined parameters and the data in the message itself.

   Dynamic compression

     Compression relative to messages sent prior to the current
     compressed message. SigComp stores and retrieves this data using
     the state handler.

   State

     Data saved for retrieval by later SigComp messages. The data
     typically reflects the contents of the UDVM memory after
     decompressing a message, but state can also be saved by the
     compressor or by the application.

   State identifier

     Reference used to access an item of state previously saved by the
     compressor, the decompressor or the application.

   CPU cycles

     A measure of the amount of "CPU power" required to execute a UDVM
     instruction (the simplest UDVM instructions require a single CPU
     cycle). An upper limit is placed on the number of cycles that can
     be used to decompress each bit in a compressed message.


3.  SigComp Architecture


   In the SigComp architecture compression and decompression is
   performed at two communicating endpoints. Figure 1 shows the layout
   of a communicating endpoint that implements a SigComp layer. The
   figure does not mandate any particular implementation, but is shown
   to the reader for the sake of clarity.

   SigComp is typically offered to applications as a "shim" layer
   between the application and the transport. Note however that for
   certain applications the compressed SigComp message may be passed
   back to the application itself for additional processing before
   transmission. For example, the application may wish to apply
   encryption to the compressed message before handing it to the
   transport.

   The SigComp layer is common for several text based protocol
   applications (identified as Application 1 and Application 2 in Figure
   1). These applications are not part of the SigComp layer.




Price, Hannu, et al.                                            [Page 5]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   The SigComp layer is further decomposed in the following components:

   - A compressor dispatcher: this is the interface from the
     applications. Application messages are received by the compressor
     dispatcher, and based on the application requirements, the
     compressor dispatcher invokes a particular compressor to achieve
     the desired compression ratio using the allocated processing and
     memory resources. The compressor returns a SigComp message that is
     forwarded to the remote SigComp peer.

   - A decompressor dispatcher: this is the interface towards the
     applications. A SigComp message is received by the decompressor
     dispatcher and an instance of the UDVM is invoked. Once the
     dispatcher has received the (decompressed) application data it
     determines the target application and forward the message to it.

   - One or more compressors: the compressors each contain an algorithm
     to perform the compression. A compressor receives an (uncompressed)
     application message from the compressor dispatcher, compresses the
     message, and returns a SigComp message to the compressor
     dispatcher. During the compression process, the compressor may
     invoke the state handler to restore a previous state or save a new
     one. The compressor is responsible for providing the remote
     decompressor with suitable UDVM bytecode to reconstruct the
     original application message. Within the compressor, the entity
     which runs the actual compression algorithm (minus state management
     issues) is known as the "encoder".

   - One or more decompressors: the decompressors contain the needed
     UDVM to perform the decompression. The decompressor receives a
     SigComp message from the decompressor dispatcher, decompress the
     message, and returns the (decompressed) application message to the
     decompressor dispatcher. During the decompression process, the
     decompressor may invoke the state handler to restore a previous
     state or save a new one.

   - State handler: this entity contains enough logic to store and
     retrieve states. A state is data that is stored between SigComp
     messages: this data can be saved either by a compressor, a
     decompressor or an application. The saved state may be used for
     (de)compression between a compressor and its peer decompressor.
     The state handler is also responsible for asking the application to
     grant permission for states to be saved by the state handler. State
     parameters and retrieval of states are further described in Chapter
     6.










Price, Hannu, et al.                                            [Page 6]


INTERNET-DRAFT                  SigComp              February 14 , 2002


               +-----------------+        +-----------------+
               |                 |        |                 |
               |  Application 1  |<-+  +->|  Application 2  |
               |                 |  |  |  |                 |
               |                 |  |  |  |                 |
               +-----------------+  |  |  +-----------------+
                      |   ^         |  |          | ^
                      |   |  Application msgs.    | |
                      | +-------------------------+ |
       +-- -- -- -- --| | | - -- -- -- -- -- -- --  |-- -- -- -- --+
       |              | | +-----------------------+ |              |
                      v v           |  |          | |
       |    +--------------+        |  |       +--------------+    |
    SigComp |              |        |  |       |              | SigComp
    message |  Compressor  |        |  |       | Decompressor | message
    <-------|  dispatcher  |        |  |       |  dispatcher  |<-------
       |    |              |        |  |       |              |    |
            +--------------+        |  |       +--------------+
       |           ^  ^             |  |            ^  ^           |
                   |  |             |  |            |  |
       |           |  |             |  |            |  |           |
                   |  |             |  |            |  |
       |           v  |             |  |            v  |           |
            +--------------+        v  |       +--------------+
       |    | Compressor 1 |    +---------+    |Decompressor 1|    |
            |              |<-->|  State  |<-->|              |
       |    |  (Encoder)   |    | Handler |    |    (UDVM)    |    |
            |              |    +---------+    |              |
       |    +--------------+           |       +--------------+    |
                      |                |               |
       |              v                |               v           |
            +--------------+           v       +--------------+
       |    | Compressor 2 |    +---------+    |Decompressor 2|    |
            |              |<-->|  State  |<-->|              |
       |    |  (Encoder)   |    | Handler |    |    (UDVM)    |    |
            |              |    +---------+    |              |
       |    +--------------+                   +--------------+    |

       |                       SigComp layer                       |
       +-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --+

     Figure 1: High-level architectural overview of one SigComp peer.

   Note that it is possible to decompress messages from multiple
   compressors at different physical locations in a network. The
   architecture is designed to prevent data from one compressor
   interfering with data from a different compressor. A consequence of
   this design choice is that it is difficult for a malicious user to
   disrupt decompressor operation by inserting false compressed messages
   on the transport.





Price, Hannu, et al.                                            [Page 7]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   The decompressors in Figure 1 should be viewed as containers for
   UDVMs; the actual decompressor functionality is handled by invoking
   an instance of the UDVM.

   Figure 2 gives a more detailed view of a UDVM, including all of the
   interfaces between the UDVM and its environment.


   +----------------+                                 +----------------+
   |                |     Request compressed data     |                |
   |                |-------------------------------->|                |
   |                |<--------------------------------|                |
   |                |     Provide compressed data     |                |
   |                |                                 |   Dispatcher   |
   |                |                                 |                |
   |                |    Output uncompressed data     |                |
   |                |-------------------------------->|                |
   |                |                                 |                |
   |                |                                 +----------------+
   |      UDVM      |
   |                |                                 +----------------+
   |                |    Request state information    |                |
   |                |-------------------------------->|                |
   |                |<--------------------------------|                |
   |                |    Provide state information    |                |
   |                |                                 |     State      |
   |                |                                 |    Handler     |
   |                |   Make state creation request   |                |
   |                |-------------------------------->|                |
   |                | Forward capability announcement |                |
   |                |                                 |                |
   +----------------+                                 +----------------+

         Figure 2: Interfaces between the UDVM and its environment

   Note that for simplicity, the UDVM indicates when it requires
   additional compressed data or state information using an explicit
   instruction. It then pauses and waits for the information to be
   supplied before continuing with the next instruction. This prevents
   the arrival of more data from interfering with the operation of the
   UDVM (e.g. by accidentally overwriting UDVM memory that is currently
   in use).

3.1.  Requirements on application

   From an application perspective the SigComp layer typically appears
   as a new transport, with similar behavior to the original transport
   used to carry uncompressed data (for example SigComp/UDP behaves
   similarly to native UDP).






Price, Hannu, et al.                                            [Page 8]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   If the application wishes to mix SigComp messages with other types of
   data (e.g. uncompressed data) on the same transport then the
   transport must distinguish between the two types of data. For UDP and
   TCP this means that a new port will need to be reserved for
   compressed data. For example [SIP] uses port 5060 for TCP and port
   5061 for TLS/TCP, so it could similarly reserve another port for
   SigComp/TCP.

   In the interests of security, a new interface is required to the
   signaling application in order to leverage the authentication
   functions built into the application itself. For each decompressed
   message that is accompanied by a state creation request, the state
   handler needs to find out whether the application considers the
   message to be legitimate. If the decompressed message is considered
   to be invalid then the state handler cannot create the requested
   state information. This interface is marked on the architecture of
   Figure 1.

3.2.  Application-defined parameters

   When an application invokes SigComp, a number of parameters are
   provided by the application to control the maximum size of compressed
   messages, the UDVM memory size etc. The two instances of the
   application that wish to communicate MUST initially agree on a common
   set of values for these parameters.

   Note that if a reverse channel is available then SigComp can perform
   an internal "capability announcement" to indicate that additional
   memory or CPU cycles are available. This means that it is generally
   sufficient to set fixed values for each application-defined parameter
   (there is no need to provide an external, application-specific
   negotiation mechanism).

   Each application-defined parameter is described below. Appendix B
   discusses how each of the parameters affects SigComp operation in
   greater detail, and recommends default values for the parameters.

   UDVM_version

     The UDVM_version parameter specifies the level of functionality
     available at the UDVM. The basic version of the UDVM (Version 0)
     is defined in this document.

   minimum_compression_ratio

     The minimum_compression_ratio parameter prevents the generation of
     excessively large SigComp messages. For an n byte uncompressed
     message, the corresponding SigComp message must be no larger than
     (n / minimum_compression_ratio) rounded down to the nearest byte.
     Note that this parameter can be less than 1, (in which case a
     certain amount of message expansion is allowed) or 0 (in which case




Price, Hannu, et al.                                            [Page 9]


INTERNET-DRAFT                  SigComp              February 14 , 2002


     no minimum_compression_ratio needs to be met). Any value other than
     0 bans the creation of standalone SigComp messages (i.e. messages
     that do not contain a compressed application message).

   maximum_compressed_size

     The maximum_compressed_size parameter limits the size of one
     compressed message. SigComp rejects any message larger than the
     specified value.

   maximum_uncompressed_size

     The maximum_uncompressed_size parameter limits the size of one
     uncompressed message. SigComp rejects any message larger than the
     specified value.

   minimum_hash_size

     The minimum_hash_size parameter specifies the minimum size of the
     state identifier when creating new state information. This value
     needs to be sufficiently large to prevent malicious users from
     guessing a state identifier by brute force.

   overall_memory_size

     The overall_memory_size parameter specifies the total number of
     bytes in the UDVM memory.

   working_memory_start

     The working_memory_start parameter specifies the start of the UDVM
     memory area that can be modified. Memory addresses below this
     value are considered read-only by the UDVM.

   working_memory_end

     The working_memory_end parameter specifies the end of the UDVM
     memory area that can be modified. Memory addresses above this
     value are considered read-only by the UDVM.

   cycles_per_bit

     The cycles_per_bit parameter specifies the number of "CPU cycles"
     that can be used to decompress a single bit of data. One CPU cycle
     typically corresponds to a single UDVM instruction, although some
     of the high-level instructions may require additional cycles.

   cycles_per_message

     The cycles_per_message parameter specifies the number of additional
     CPU cycles made available at the start of a compressed message.




Price, Hannu, et al.                                           [Page 10]


INTERNET-DRAFT                  SigComp              February 14 , 2002


     These cycles can be useful when decompressing algorithms that
     upload additional data on a per-message basis, for example a new
     set of Huffman codes as with [DEFLATE].

     The total number of "CPU cycles" available for each compressed
     message is specified by the following formula:

     total_cycles = message_size * cycles_per_bit + cycles_per_message

   first_instruction

     The first_instruction parameter specifies the memory address of the
     first instruction to be executed when the UDVM is initialized.

   Initial memory contents

     When the UDVM is invoked its memory is reset to contents defined by
     the application. This code is executed for every SigComp message
     (so typically it performs a simple task such as extracting the
     first n bytes from the message and interpreting them as a state
     identifier).

   Initial state

     As well as deciding the initial contents of the UDVM memory, the
     application can also store useful information in the form of state.
     This predefined state is used to offer a range of well-known
     decompression algorithms to the compressor, which can choose to
     avoid uploading bytecode for a new algorithm if it supports one of
     the well-known algorithms. Each item of initial state can be made
     mandatory for every instance of the application, or it can be made
     optional (in which case support for the relevant state will need to
     be advertised before the state can be used).

4.  SigComp message flow

   This chapter describes the SigComp message flow, including the
   initialization, capability announcement and exchange of compressed
   messages.

   In the architecture of Figure 1, this chapter describes the operation
   of the compressor and decompressor dispatcher.

4.1.  Message exchange

   The local SigComp layer may send compressed data to a remote SigComp
   layer, and the local SigComp layer may also receive compressed data.
   However, compression in one direction does not necessarily imply
   compression in the reverse direction. Furthermore, even in the case
   that there are two unidirectional compressed flows between two





Price, Hannu, et al.                                           [Page 11]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   SigComp layer, there is no need to use the same compression algorithm
   at both compressors.

4.1.1.  Operation for each compressor-decompressor pair

   An endpoint that wants to send compressed data to a remote party must
   initialize a SigComp layer at the local party prior to its use, so
   that the decompressor dispatcher in the remote endpoint assigns a
   decompressor to be used and the UDVM is loaded with the compression
   algorithm. The process is described in Figure 3.


             +--------------+                  +--------------+
             |              |                  |              |
             |  Endpoint A  |                  |  Endpoint B  |
             |              |                  |              |
             +--------------+                  +--------------+
                    |                                 |
                    |                                 |
                    |        SigComp Discovery        |
                    |<------------------------------->|
                    |                                 |
                    |        SigComp Request          |
                    |-------------------------------->|
                    |                                 |
                    |                                 |
                    |     Capabilities Announcement   |
                    |<--------------------------------|
                    |                                 |
                    |                                 |
                    |           UDVM Upload           |
                    |-------------------------------->|
                    |                                 |
                    |        Compressed Messages      |
                    |- - - -  - -  - - - - - - - - - >|

             Figure 3: Compressor-decompressor pair operation

   The SigComp discovery mechanism itself is outside the scope of this
   specification.

   The following three paragraphs specify a message flow for discovering
   the capabilities of Endpoint B and (if necessary) uploading a new
   decompression algorithm to this endpoint. Note that if an
   application-defined default algorithm is available at all endpoints
   then Endpoint A can immediately begin to compress messages and the
   following stages may be skipped.

   Endpoint A may send a SigComp Request message to Endpoint B. The
   SigComp Request message is a request to initialize a SigComp layer
   for the application at Endpoint B and to know B's capabilities, i.e.




Price, Hannu, et al.                                           [Page 12]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   the parameters in Section 3.2. If Endpoint A uses a UDVM
   decompression algorithm which only requires the default application-
   defined parameters, then this step may be omitted.

   Endpoint B SHOULD answer the SigComp Request message with a
   Capabilities Announcement message, which includes the SigComp
   parameters that constrain the operation of the UDVM.

   Once Endpoint A has received the Capabilities Announcement message,
   it chooses a suitable compression algorithm that B is able to
   decompress and sends a message containing the UDVM decompression
   algorithm (unless Endpoint B already has the algorithm available).

   At this point, Endpoint B contains enough information to start
   decompressing messages received from the application at Endpoint A.

4.1.2.  Bi-directional initialization

   In scenarios where both endpoints decide to compress data in each of
   the directions, a double initialization process must be done prior to
   start with the normal operation.

   The double initialization process is comprised of two of the above
   initialization processes, one in each direction, as described in
   Section 4.1.1.

4.2.  SigComp message format

   The basic SigComp message consists of a block of UDVM bytecode, the
   first n bytes of which are interpreted as a state identifier that
   accesses some previously stored state information.

   This state information comprises the decompression algorithm that
   will be used to decompress the remainder of the SigComp message, as
   well as any needed additional information (e.g. one or more
   previously received messages if dynamic compression is in use).

   A decompressor dispatcher MUST be able to separate two SigComp
   messages; in the case of UDP a SigComp message corresponds exactly to
   one UDP datagram. For TCP each 0xFFFF delimiter is followed by a new
   SigComp message.

   The format of the basic SigComp message is given in Figure 4:

   [Editors' Note: Specific SigComp messages such as the SigComp
   Request, the Capabilities Announcement and the UDVM Upload may need
   to be defined. A state identifier could be reserved for each specific
   type of message, just as state identifiers are reserved for each
   well-known algorithm.]






Price, Hannu, et al.                                           [Page 13]


INTERNET-DRAFT                  SigComp              February 14 , 2002


     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   |                               |
   :   state_identifier (n-bytes)  :
   |                               |
   +---+---+---+---+---+---+---+---+
   |                               |
   :    Remaining UDVM bytecode    :
   |                               |
   +---+---+---+---+---+---+---+---+

    Figure 4: Basic SigComp message

   Note that n is the application-defined parameter minimum_hash_size,
   an example value for which is given in Appendix B.

   Note also that the state information is loaded into the UDVM memory
   and executed as defined by the following piece of UDVM bytecode:

   reserve state_identifier (n)
   INPUT-BYTECODE (n, state_identifier, fail)
   STATE-EXECUTE (state_identifier, n)
   :fail
   DECOMPRESSION-FAILURE

   If the UDVM memory is initialized containing the above bytecode then
   the state identifier will automatically be extracted from the SigComp
   message and the corresponding state information will be accessed and
   executed.

4.3.  Interfaces to and from the dispatcher

   Once the remote party has initialized the SigComp layer at the local
   party, the decompressor dispatcher is ready to receive compressed
   messages from a particular remote party, decompress those messages,
   and pass them onto the application.

   The application provides the compressor dispatcher with messages to
   be compressed. The encoder in the compressor compresses messages in
   such a way that the remote decompressor with the UDVM can decompress
   the data correctly (providing that the compressed data is not lost or
   damaged during transport).

   When a message is to be compressed, the compressor selects the state
   to use. The identifier of the used state MUST be sent along with the
   compressed message to the remote decompressor. The SigComp message is
   then passed on to underlying layers for transport to the remote
   decompressor.

   [Editor's Note: State identifiers will need to be reserved for well-
   known decompression algorithms, and an additional state identifier




Price, Hannu, et al.                                           [Page 14]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   will be needed to indicate that the algorithm is being uploaded as
   part of the compressed message.]

   Upon the arrival of a SigComp message the decompressor dispatcher
   invokes the decompressor that loads the UDVM with the indicated
   state. The message is then decompressed by the UDVM, returned to the
   decompressor dispatcher, and possibly passed on to the receiving
   application.

   Note that when the UDVM is invoked it does not receive any compressed
   data by default, but instead requests new data explicitly using a
   specific instruction. Therefore, the dispatcher is responsible for
   buffering each SigComp message and passing the data to the UDVM when
   it is requested.

   Uncompressed data is also outputted by the UDVM using a specific
   instruction. Depending on the particular application, the dispatcher
   decides whether to forward a partially decompressed message
   immediately to the application, or to buffer and wait for a complete
   message to be successfully decompressed.

   For a stream-based transport, the dispatcher delimits messages by
   parsing the compressed data stream for instances of 0xFF and taking
   the following actions:

   Occurs in data stream:     Action:

   0xFFFF                     Delimit compressed message
   0xFF00                     Replace with 0xFF
   0xFF01 - 0xFFFE            Decompression failure

   The reserved character 0xFF00 is useful for byte stuffing (if a
   compression algorithm generates compressed data containing the
   character 0xFF then it should be replaced by the character 0xFF00 to
   avoid accidentally inserting a message delimiter into the compressed
   data stream).

5.  SigComp compressor

   An important feature of SigComp is that if two endpoints cannot agree
   on a common algorithm with which to send and receive data, it is
   possible for the compressor to upload bytecode for its own choice of
   algorithm to the decompressor. In particular this means that it is
   not necessary to force all compressors to use the same default
   algorithm; instead each implementer has the freedom to pick one of
   the predefined algorithms or to upload their own if needed.

   The overall requirement placed on the compressor is that of
   transparency, i.e. the compressor MUST NOT send bytecode which cause
   the UDVM to incorrectly decompress a given message.





Price, Hannu, et al.                                           [Page 15]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   The following more specific requirements are also placed on the
   compressor (they can be considered particular instances of the
   transparency requirement):

   *    It is RECOMMENDED that the compressor supply a CRC over the
        uncompressed message to ensure that successful decompression has
        occurred. A UDVM instruction is provided to verify this CRC.

   *    If the transport is message-based then the compressor MUST
        preserve the boundaries between messages.

   *    If the transport is stream-based but the application defines its
        own internal message boundaries, then the compressor SHOULD
        preserve the boundaries between messages by using the "end-of-
        message" character 0xFFFF reserved by SigComp.

   *    The compressor MUST achieve the minimum_compression_ratio and
        MUST ensure that the message can be decompressed using no more
        than the resources available at the remote decompressor.

   The reason for preserving the message boundaries over a stream-based
   transport is that damage to one compressed message does not affect
   the decompression of subsequent messages. Moreover, the application
   typically vetoes state creation requests on a per-message basis.

   Note that SigComp also reserves the character 0xFF00 over a stream-
   based transport, and replaces every instance of 0xFF00 with 0xFF
   before decompressing the data. This ensures that arbitrary
   compression algorithms can be used over a stream-based transport,
   provided that every instance of 0xFF in the compressed data stream is
   identified and replaced with 0xFF00. This "byte-stuffing" scheme
   prevents the compression algorithm from inserting a message delimiter
   into the data stream where one is not required.

5.1.  Types of compression algorithm

   Any of the following classes of compression algorithm may be useful
   for particular applications:

   *    Generic compressor (for example [DEFLATE] or a similar
        algorithm).

   *    Protocol-aware compressor offering excellent performance for
        one particular type of data (for example the text messages
        generated by [SIP]).

   *    Hybrid compressor with similar performance to [DEFLATE] for
        generic data and superior performance for certain types of data.







Price, Hannu, et al.                                           [Page 16]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   Provided that the uncompressed data can be reconstructed at the UDVM
   using the available memory and CPU cycles, implementers have freedom
   to use a compression algorithm of their choice.

5.2.  Supplying bytecode to the UDVM

   A compressor MUST be certain that compressed data can be decompressed
   before the data is to be sent, i.e. the UDVM instructions for
   decompression MUST be available at the peer decompressor. Several
   options exist for ensuring that this bytecode is available:

   1. Each SigComp message sent from the compressor contains the
      necessary UDVM instructions for decompression.

   2. By setting up a reliable connection, such as a TCP connection,
      between a compressor and its peer decompressor the UDVM
      instructions can be transferred and saved as state.

   3. If there are predefined UDVM codes for well-known algorithms, a
      compressor only needs to send the state identifier of that UDVM
      decompression algorithm code to its peer decompressor. The
      decompressor can then populate the UDVM locally.

   In order to save delay for "time-critical" sessions, the UDVM
   instructions should be uploaded prior to any initiation of "time-
   critical" sessions.

5.3.  Compression failure

   The compressor SHOULD make every effort to successfully compress an
   application message, but in certain cases this might not be possible
   (particularly if a high minimum_compression_ratio has been set by the
   application). In this case a "compression failure" is called.

   Reasons for compression failure include the following:

   *    A compressed or uncompressed message exceeds the maximum size
        defined by the application.

   *    The minimum_compression_ratio cannot be achieved for a certain
        message.

   *    Insufficient resources are available at the compressor or at the
        remote decompressor.

   If a compression failure occurs when compressing a message then the
   compressor informs the dispatcher and takes no further action. The
   dispatcher can then report this failure to the application.







Price, Hannu, et al.                                           [Page 17]


INTERNET-DRAFT                  SigComp              February 14 , 2002


6.  State handling and capability announcement

   This chapter defines the behavior of the SigComp state handler. The
   function of the state handler is to retain information between
   successive SigComp messages; it is the only SigComp entity that is
   capable of this function, and so it is of particular importance from
   a security perspective.

6.1.  Storing and retrieving state

   To provide security against the malicious insertion of false
   compressed data, the UDVM memory is reset after each compressed
   message. This ensures that damaged compressed messages do not prevent
   the successful decompression of subsequent valid messages.
   Note however that the overall compression ratio is often
   significantly higher if messages can be compressed relative to the
   information stored in previous messages. For this reason it is
   possible to create "state" information for access when a later
   message is being decompressed.

   Both the creation and access of state are designed to be secure
   against malicious tampering with the compressed data. State can only
   be created when a complete message has been successfully
   decompressed, and the state handler MUST veto a state creation
   request if instructed by the application based on the contents of the
   decompressed message. This is especially useful if the application
   has an authentication mechanism that can be applied to determine
   whether the decompressed data is legitimate.

   Furthermore, a compressor can only access previously created state
   information by providing an [MD5] hash of the state to be accessed.
   The advantage of using a secure hash to access state information is
   that it is very difficult to guess the correct hash value without
   complete knowledge of the state being accessed.

   Also note that state is not deleted when it is accessed. So even if a
   malicious user manages to access state information, subsequent
   messages compressed relative to this state can still be successfully
   decompressed. Instead, the state handler is responsible for deleting
   state information once it determines that the state will no longer be
   needed.

   Each item of state stores the following information:

   Name:                      Type of data:

   state_identifier           16-byte value
   state start                2-byte value
   state_instruction          2-byte value
   state length               2-byte value
   state_value                String of bytes




Price, Hannu, et al.                                           [Page 18]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   The state_identifier must be supplied to retrieve an item of state
   from the state handler. State can be accessed using the UDVM
   instructions STATE-REFERENCE and STATE-EXECUTE, and can be created
   using the END-MESSAGE instruction.

   The state_value is a byte string that contains the actual value that
   is copied from/to the UDVM memory. The state_length specifies the
   number of bytes contained within state_value, and state_start gives
   the UDVM memory address from/to which the state_value is copied.

   Finally, state_instruction specifies the memory address of the next
   UDVM instruction to execute when state is accessed.

   The kind of information which is included in the state_value is up to
   a particular compressor and the uploaded instructions in the remote
   UDVM. However a compressor MUST not use a state that is not known to
   be established at the remote decompressor.

6.2. Guidelines for saving and deleting states

   [Editors' Note: Do we need something more?]

   A decompressor SHOULD NOT delete a state before it is confident
   enough that the state is not used by a peer compressor any more.

6.3.  Capability announcement

   The capability announcement information is used to modify the value
   of certain application-defined parameters. Since these parameter
   values are saved between SigComp messages, they are considered to be
   part of the overall state and hence are supplied from the UDVM to the
   state handler.

   If the state handler rejects a state creation request then the
   accompanying capability announcements MUST be rejected also.

   If the unidirectional version of SigComp is running then the
   capability announcement information is automatically discarded by the
   state handler.

   The following block of parameters is passed to the state handler
   using the appropriate UDVM instruction (currently the END-MESSAGE
   instruction):

   [Editors' Note: The capability announcement block is yet to be
   finalized. More items may be added in future.]









Price, Hannu, et al.                                           [Page 19]


INTERNET-DRAFT                  SigComp              February 14 , 2002


      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |       Total length        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |       UDVM_version        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    overall_memory_size    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |      cycles_per_bit       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    cycles_per_message     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Requested feedback length |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           |
      :    Requested feedback     :
      |                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Returned feedback length  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           |
      :     Returned feedback     :
      |                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 5: Capability announcement block

   Note that the three 2-byte length fields specify the lengths of the
   entire capability announcement block, the requested feedback data and
   the returned feedback data respectively. As usual, MSBs are stored
   preceding LSBs.

   The remaining items of data are explained in greater detail below:

6.3.1.  UDVM version

   The first 2 bytes of the capability announcement block specify
   whether only the basic version of the UDVM is available, or whether
   an upgraded version of the UDVM is available offering additional
   instructions etc.

   The basic version of the UDVM is Version 0, which is the version
   described in this document. Upgraded versions MUST be backwards-
   compatible with the basic version in the following sense:

   *    If some UDVM bytecode reaches the END-MESSAGE or DECOMPRESSION-
        FAILURE instructions when running on Version 0 of the UDVM, then
        the upgraded version MUST run the bytecode in an identical
        manner.

   This condition ensures that all bytecode that is valid for Version 0
   of the UDVM will continue to be valid for upgraded versions of the




Price, Hannu, et al.                                           [Page 20]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   UDVM. However, bytecode that is invalid on Version 0 of the UDVM
   (i.e. bytecode that produces a decompression failure that is not
   manually triggered) may become valid on upgraded versions.

   The simplest way to upgrade the UDVM in a backwards-compatible manner
   is to add additional UDVM instructions, as this will not affect the
   operation of existing UDVM bytecode.

6.3.2.  Memory size and CPU cycles

   The next 6 bytes of data specify new values for the application-
   defined parameters overall_memory_size, cycles_per_bit and
   cycles_per_message.

   Note that this data can only be used to increase the amount of
   resources available at the remote UDVM. If the data specifies a
   parameter value that is smaller than the value already possessed by
   the state handler, the parameter keeps its original value (i.e. the
   capability announcement data for this parameter is simply ignored).

   In particular, only allowing the parameter values to increase means
   that the announcement mechanism is robust against message loss or
   reordering.

   The parameters can only be restored to their original values if reset
   or renegotiated by the application.

6.3.3.  Requested feedback

   The requested feedback data is provided to the UDVM by the remote
   compressor. By providing this data, the remote compressor is
   requesting that the data be returned to the compressor via a reverse
   channel (assuming that one is present).

   The compressor in control of the reverse channel SHOULD return this
   data by uploading it into the returned feedback data block at the
   remote UDVM. The data will then be passed back to the remote
   compressor as explained below.

6.3.4.  Returned feedback

   The returned feedback data is an item of feedback data that has
   successfully returned from the remote entity. This data is passed to
   the local compressor (assuming that permission is granted by the
   application), which can make use of it as it wishes.

   Note that a compressor MUST only populate the returned feedback data
   with the bit-exact contents of a requested feedback data block
   previously provided to it.






Price, Hannu, et al.                                           [Page 21]


INTERNET-DRAFT                  SigComp              February 14 , 2002


7.  Overview of the UDVM

   Decompression functionality for SigComp is provided by a "Universal
   Decompressor Virtual Machine" (UDVM). The UDVM is a virtual machine
   much like the Java Virtual Machine but with a key difference: it is
   designed solely for the purpose of running decompression algorithms.

   The motivation for creating the UDVM is to provide unlimited
   flexibility when choosing how to compress a given item of data.
   Rather than picking one of a small number of pre-negotiated
   compression algorithms, the implementer has the freedom to select an
   algorithm of their choice. The compressed data is then combined with
   a set of UDVM instructions that allow the original data to be
   extracted, and the result is outputted as UDVM bytecode.

   Since the UDVM is optimized specifically for running decompression
   algorithms, the code size of a typical algorithm is small (often sub
   100 bytes). Moreover the UDVM approach does not add significant extra
   processing or memory requirements compared to running a fixed pre-
   programmed decompression algorithm.

   This chapter describes some basic features of the UDVM, including the
   memory allocation, well-known variables and instruction parameters.

7.1.  UDVM memory allocation

   The memory available to the UDVM is partitioned into a number of
   sections, providing space for program code, variables and
   miscellaneous data:

                  <----- working_memory_size ------>

   | Fixed values | Variables | Miscellaneous data | Program code |
   +--------------+-----------+--------------------+--------------+

   <--------------------- overall_memory_size -------------------->

                  Figure 6: Memory allocation in the UDVM

   Recall that the amount of memory available to the UDVM is specified
   by the application-defined parameters overall_memory_size,
   working_memory_start and working_memory_end. Note that all of these
   parameters are initialized by the application, but can be
   renegotiated on the fly using the capabilities announcement
   mechanism.

   The memory area from Address (working_memory_start) to Address
   (working_memory_end) inclusive can be used to store arbitrary data
   (variables, program code, Huffman codes etc.). UDVM instructions are
   allowed to read from or write to any address in this memory area.





Price, Hannu, et al.                                           [Page 22]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   The first part of this memory area is typically used to store a
   number of 2-byte variables. UDVM instructions can reference these
   variables using a special instruction parameter as described in
   Section 7.3.

   The memory area from Address 0 to Address (working_memory_start - 1)
   and from Address (working_memory_end + 1) to Address
   (overall_memory_size - 1) inclusive is write-protected, so UDVM
   instructions can read from this memory area but cannot write to it.
   This memory area is intended for storing UDVM bytecode that can be
   compiled.

   Any attempt to read memory addresses beyond the overall memory size
   or to write to memory addresses outside the working memory area MUST
   cause a decompression failure (see Section 8.3).

   The first part of the write-protected UDVM memory is intended for
   storing variables whose values no longer need to be modified. The
   second part of the write-protected memory is intended for storing
   program code including UDVM instructions and their associated
   parameters. Note that if an instruction references a variable that
   has been write-protected, the compiled version of the instruction
   will typically run faster than if the referenced variable lies in the
   working memory area.

7.2.  Well-known variables

   The first few variables in the UDVM memory have special tasks, for
   example specifying the location of the stack used by the CALL and
   RETURN instructions. Each of these well-known variables is a 2-byte
   integer.

   The following list gives the name of each well-known variable and the
   memory address at which the variable can be found:

   Name:           Starting memory address:

   byte_copy_left             0
   byte_copy_right            2
   stack_location             4

   The MSBs of each variable are always stored before the LSBs. So, for
   example, the MSBs of stack_location are stored at Address 4 whilst
   the LSBs are stored at Address 5.

   The use of each well-known variable is described in the following
   sections of the document.








Price, Hannu, et al.                                           [Page 23]


INTERNET-DRAFT                  SigComp              February 14 , 2002


7.3.  Instruction parameters

   Each of the UDVM instructions is followed by 0 or more bytes
   containing the parameters required by the instruction.

   To reduce the code size of a typical UDVM program, each parameter for
   a UDVM instruction is compressed using variable-length encoding. The
   aim is to store more common parameter values using fewer bits than
   rarely occurring values.

   Three different types of parameter are available: the literal, the
   reference and the multitype. The parameter types that follow each
   UDVM instruction are specified in Chapter 9.

   The UDVM bytecode for each parameter type is illustrated in Figure 7
   to Figure 9, together with the integer values represented by the
   bytecode.

   Note that the MSBs in the bytecode are illustrated as preceding the
   LSBs. Also, any string of bits marked with k consecutive "n"s is to
   be interpreted as an integer N from 0 to 2^k - 1 inclusive (with the
   MSBs of n illustrated as preceding the LSBs).

   The decoded integer value of the bytecode can be interpreted in two
   ways. In some cases it is taken to be the actual value of the
   parameter. In other cases it is taken to be a memory address at which
   the 2-byte parameter value can be found (MSBs found at the specified
   address, LSBs found at the following address). The latter case is
   denoted by memory[X] where X is the address and memory[X] is the 2-
   byte value starting at Address X.

   The simplest parameter type is the literal (#), which encodes a
   constant integer from 0 to 65535 inclusive. A literal parameter may
   require between 1 and 3 bytes depending on its value.

   Bytecode:                  Parameter value:         Range:

   0nnnnnnn                        N                   0 - 127
   10nnnnnn nnnnnnnn               N                   0 - 16383
   11000000 nnnnnnnn nnnnnnnn      N                   0 - 65535

              Figure 7: Bytecode for a literal (#) parameter

   The second parameter type is the reference ($), which is always used
   to access a 2-byte value located elsewhere in the UDVM memory. The
   bytecode for a reference parameter is decoded to be a constant
   integer from 0 to 65535 inclusive, which is interpreted as the memory
   address containing the actual value of the parameter.

   Note that reference parameters can always take values from 0 to 65535
   inclusive, as they reference 2-byte values.




Price, Hannu, et al.                                           [Page 24]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   Bytecode:                  Parameter value:         Range:

   0nnnnnnn                        memory[2 * N]       0 - 65535
   10nnnnnn nnnnnnnn               memory[2 * N]       0 - 65535
   11000000 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535

             Figure 8: Bytecode for a reference ($) parameter

   The third kind of parameter is the multitype (%), which can be used
   to encode both actual values and memory addresses. The multitype
   parameter also offers efficient encoding for small integer values
   (both positive and negative) and for powers of 2.

   Bytecode:                  Parameter value:         Range:

   00nnnnnn                        N                   0 - 63
   01nnnnnn                        memory[2 * N]       0 - 65535
   1000011n                        2 ^ (N + 6)        64 , 128
   10001nnn                        2 ^ (N + 8)    256 , ... , 32768
   111nnnnn                        N + 65504       65504 - 65535
   1001nnnn nnnnnnnn               N + 61440       61440 - 65535
   101nnnnn nnnnnnnn               N                   0 - 8191
   110nnnnn nnnnnnnn               memory[N]           0 - 65535
   10000000 nnnnnnnn nnnnnnnn      N                   0 - 65535
   10000001 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535

             Figure 9: Bytecode for a multitype (%) parameter

7.4.  Byte copying

   A number of UDVM instructions require a string of bytes to be copied
   to and from areas of the UDVM memory. This section defines how the
   byte copying operation should be performed.

   In general, the string of bytes is copied in ascending order of
   memory address. So if a byte is copied from/to Address n then the
   next byte is copied from/to Address n + 1. As usual, if a byte is
   read from an address beyond the overall memory size or is written to
   an address outside the working memory area then decompression failure
   occurs.

   Note however that if a byte is copied from/to the memory address
   specified in byte_copy_right, the byte copy operation continues by
   copying the next byte from/to the memory address specified in
   byte_copy_left. This is useful for setting up a "circular buffer"
   within the UDVM memory.

   Note that the string of bytes is copied on a purely byte-by-byte
   basis. In particular, some of the later bytes to be copied may
   themselves have been written into the UDVM memory by the byte copying
   operation currently being performed.




Price, Hannu, et al.                                           [Page 25]


INTERNET-DRAFT                  SigComp              February 14 , 2002



   Equally, it is possible for a byte copying operation to overwrite the
   instruction that called the byte copy. If this occurs then the byte
   copying operation MUST be completed as if the original instruction
   were still in place in the UDVM memory (this also applies if
   byte_copy_left or byte_copy_right are overwritten).

8.  Decompressing a SigComp message

   This chapter lists the steps involved in the decompression of a
   single SigComp message.

8.1.  Invoking the UDVM

   Whenever the dispatcher receives a message to be decompressed, it
   invokes a new instance of the UDVM. The overall_memory_size and
   initial contents of the UDVM memory are initialized using the
   corresponding application-defined parameters. The following steps are
   then taken:

   1.)   The number of remaining CPU cycles is set equal to the
   application-defined parameter cycles_per_message.

   Notes:

   The amount of compressed data available to the UDVM is exactly one
   compressed message. If the transport is stream-based then SigComp
   uses the reserved byte string 0xFFFF to delimit the compressed
   messages: the dispatcher takes the data between a pair of neighboring
   reserved byte strings to be a single compressed message. The reserved
   byte string itself is not considered to be part of the compressed
   message.

   The compressed data is not provided to the UDVM by default. Instead,
   the UDVM requests compressed data using the INPUT instructions
   (useful when running over a stream-based transport since there is no
   need to wait for the entire compressed message before decompression
   can begin). Note that in particular, this means that the application
   MUST define the initial contents of the UDVM memory to contain at
   least one INPUT instruction. See Section 4.2 for an example of how
   the application might initialize the UDVM memory.

   The dispatcher MUST NOT make more than one compressed message
   available to a given instance of the UDVM. In particular, the
   dispatcher MUST NOT concatenate two messages to form a single
   compressed message. This is because compressed messages are typically
   padded with trailing zero bits so that they are a whole number of
   bytes long. Concatenating two messages would cause these padding bits
   to be incorrectly interpreted as compressed data.






Price, Hannu, et al.                                           [Page 26]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   2.)   Next, the instructions contained within the UDVM memory are
   executed beginning at the address specified in first_instruction.

   Notes:

   The instructions are executed consecutively unless otherwise
   indicated (for example when the UDVM encounters a JUMP instruction).

   If the next instruction to be executed lies outside the available
   memory then decompression failure occurs (see Section 8.3).

   3.)   Each time an instruction is executed the number of available
   CPU cycles is decreased by the amount specified in Chapter 9.
   Additionally, if the UDVM requests n bits of compressed data (using
   one of the INPUT instructions) then the number of available CPU
   cycles is increased by n * cycles_per_bit.

   Notes:

   This means that the total number of CPU cycles available for
   processing a compressed message is given by the formula:

     total_cycles = cycles_per_message + message_size * cycles_per_bit

   The reason that this total is not allocated to the UDVM when it is
   invoked is that the UDVM can begin to decompress a message that has
   only been partially received. So the total message size may not be
   known when the UDVM is initialized.

   4.)   The UDVM stops executing instructions when it encounters an
   END-MESSAGE instruction or if decompression failure occurs.

   Notes:

   The UDVM passes uncompressed data to the dispatcher using the OUTPUT
   instruction. The OUTPUT instruction can be used to output a partially
   decompressed message; it is a dispatcher decision whether to use the
   data immediately or whether to buffer and wait until the entire
   message has been decompressed.

   The UDVM passes state creation requests to the state handler using
   the END-MESSAGE instruction. This means that it is only possible to
   make a state creation request once the message has been decompressed,
   which is necessary since the application typically determines the
   validity of these requests based on the contents of the decompressed
   message.

8.2.  Successful decompression

   The END-MESSAGE instruction indicates that the compressed message has
   been successfully decompressed and passed to the dispatcher. Note




Price, Hannu, et al.                                           [Page 27]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   that the actual uncompressed message is outputted beforehand using
   the OUTPUT instruction; this allows the UDVM to output each part of
   the message to the dispatcher as soon as it has been decompressed.

   The END-MESSAGE instruction provides two additional pieces of
   information to the state handler: the state creation request and the
   capability announcement block. The state creation request mechanism
   is discussed below:

   The UDVM may optionally save part of its memory for retrieval by
   later messages. However to prevent malicious storage of a large
   amount of unnecessary state information, the application itself MUST
   give permission before any state can be created. The state handler
   typically makes a decision on whether state can be created based on
   the contents of the decompressed message, particularly if the message
   contains authentication data that can verify whether or not the
   sender is legitimate.

   The END-MESSAGE instruction requests the creation of state using the
   parameters state start and state length, which together denote a byte
   string state_value. Provided that the application gives permission,
   state_value is byte copied from the UDVM memory (obeying the rules of
   Section 7.4) and stored together with a 16-byte state identifier that
   can be used to access the state by a later compressed message.

   To provide security against malicious access, the identifier for any
   item of state created by the UDVM is derived from the [MD5] hash of
   the state_value to be stored. The state identifier is constructed by
   taking the 16-byte [MD5] hash and replacing all but the first
   hash_length most significant bytes with zeroes. Note that if
   hash_length is 16 then the unmodified [MD5] hash is the state
   identifier. Decompression failure occurs if hash_length is less than
   the application-defined parameter minimum_hash_size or greater than
   16.

   If a state identifier already exists (hash collision occurs), the
   decompressor should check whether the requested state is identical to
   the established state, and count the state creation request as
   successful if this is the case.

   If not then the state creation request is unsuccessful. The existing
   state MUST NOT be replaced with the requested state to be saved. This
   is to avoid the situation where a compressed message cannot be
   decompressed because a needed item of state has been replaced
   (possibly by a malicious sender).

   Each item of state stores the following information (accessed by the
   state_identifier):







Price, Hannu, et al.                                           [Page 28]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   Name:                      Type of data:

   state_identifier           16-byte value
   state start                2-byte value
   state_instruction          2-byte value
   state length               2-byte value
   state_value                String of bytes

   Note that state_start, state_length and state_instruction are all
   parameters from the END-MESSAGE instruction, whereas state_identifier
   and state_value are created as specified above.

   This state can subsequently be accessed by using the STATE-REFERENCE
   and STATE-EXECUTE instructions (by providing the correct state
   identifier).

8.3.  Decompression failure

   If a compressed message given to the UDVM is corrupted (either
   accidentally or maliciously) then the UDVM may terminate with a
   decompression failure.

   Reasons for decompression failure include the following:

   *    A compressed or uncompressed message exceeds the maximum size
        defined by the application.

   *    The UDVM exceeds the available CPU cycles for decompressing a
        message.

   *    The UDVM attempts to read a memory address beyond the overall
        memory size, or to write into a memory address outside the
        working memory area.

   *    An unknown instruction type is encountered.

   *    An unknown parameter type is encountered.

   *    An instruction is encountered that cannot be processed
        successfully by the UDVM (for example a RETURN instruction when
        no CALL instruction has previously been encountered).

   *    The UDVM attempts to access non-existent state.

   *    A manual decompression failure is triggered using the
        DECOMPRESSION-FAILURE instruction.

   If a decompression failure occurs when decompressing a message then
   the UDVM informs the dispatcher and takes no further action. It is
   the responsibility of the dispatcher to decide how to cope with the





Price, Hannu, et al.                                           [Page 29]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   decompression failure. In general a dispatcher SHOULD discard the
   compressed message and any decompressed data that has been outputted.

9.  UDVM instruction set

   The UDVM currently understands 28 instructions, chosen to support the
   widest possible range of compression algorithms with the minimum
   possible overhead.

   Figure 10 lists the different instructions and the bytecode values
   used to store the instructions at the UDVM. The cost of each
   instruction in CPU cycles is also given:

   Instruction:     Bytecode value:   Cost in CPU cycles:

   DECOMPRESSION-FAILURE     0          1
   AND                       1          1
   OR                        2          1
   NOT                       3          1
   ADD                       4          1
   SUBTRACT                  5          1
   MULTIPLY                  6          1
   DIVIDE                    7          1
   LOAD                      8          1
   MULTILOAD                 9          1 + n
   WORKING-MEMORY            10         1
   COPY                      11         1 + length
   COPY-LITERAL              12         1 + length
   COPY-OFFSET               13         1 + length + offset
   JUMP                      14         1
   COMPARE                   15         1
   CALL                      16         1
   RETURN                    17         1
   SWITCH                    18         1 + n
   CRC                       19         1 + length
   END-MESSAGE               20         1 + state length
   OUTPUT                    21         1 + output_length
   NBO                       22         1
   INPUT-BYTECODE            23         1 + length
   INPUT-FIXED               24         1
   INPUT-HUFFMAN             25         1 + n
   STATE-REFERENCE           26         1 + state_length
   STATE-EXECUTE             27         1 + state length

      Figure 10: UDVM instructions and corresponding bytecode values

   Each UDVM instruction costs a minimum of 1 CPU cycle. Certain high-
   level instructions may cost additional cycles depending on the value
   of one of the instruction parameters.






Price, Hannu, et al.                                           [Page 30]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   The only exception when calculating the number of CPU cycles is that
   the STATE-EXECUTE instruction takes (1 + state_length) cycles even
   though it does not have a state_length parameter; instead the value
   of state length is provided by the state handler as part of the state
   being accessed.

   All instructions are stored as a single byte to indicate the
   instruction type, followed by 0 or more bytes containing the
   parameters required by the instruction. The instruction specifies
   which of the three parameter types of Section 7.3 is used in each
   case. For example, the ADD instruction is followed by two parameters
   as shown below:

   ADD ($parameter_1, %parameter_2)

   When converted into bytecode the number of bytes required by the ADD
   instruction depends on the size of each parameter value, and whether
   the second (multitype) parameter contains the parameter value itself
   or a memory address where the actual value of the parameter can be
   found.

   The instruction set available for the UDVM offers a mix of low-level
   and high-level instructions. The high-level instructions can all be
   emulated using the low-level instructions provided, but given a
   choice it is generally preferable to use a single instruction rather
   than a large number of general-purpose instructions. The resulting
   bytecode will be more compact (leading to a higher overall
   compression ratio) and decompression will typically be faster because
   the implementation of the compression-specific instructions can be
   optimized for the UDVM.

   Each instruction is explained in more detail below:

9.1.  Bit manipulation instructions

   The AND, OR and NOT instructions provide simple bit manipulation on
   2-byte words.

   AND ($parameter_1, %parameter_2)
   OR ($parameter_1, %parameter_2)
   NOT ($parameter_1)

   After the operation is complete, the value of the first parameter is
   overwritten with the result. Note that since this parameter is a
   reference, the memory address specified by the parameter is always
   overwritten and not the parameter itself.

9.2.  Arithmetic instructions

   The ADD, SUBTRACT, MULTIPLY and DIVIDE instructions perform
   arithmetic on 2-byte words.




Price, Hannu, et al.                                           [Page 31]


INTERNET-DRAFT                  SigComp              February 14 , 2002



   ADD ($parameter_1, %parameter_2)
   SUBTRACT ($parameter_1, %parameter_2)
   MULTIPLY ($parameter_1, %parameter_2)
   DIVIDE ($parameter_1, %parameter_2)

   After the operation is complete, the first parameter is overwritten
   with the result.

   Note that in all cases the arithmetic operation is performed modulo
   2^16. So for example, subtracting 1 from 0 gives the result 65535.

   For the SUBTRACT instruction the second parameter is subtracted from
   the first. Similarly, for the DIVIDE instruction the first parameter
   is divided by the second parameter. Note that if the second parameter
   does not divide exactly into the first parameter then the remainder
   is ignored.

9.3.  Memory management instructions

   The following instructions are used to manipulate the UDVM memory.
   Bytes can be copied from one area of memory to another, and areas of
   memory can be write-protected to make it easier for UDVM code to be
   compiled.

9.3.1.  LOAD

   The LOAD instruction sets a 2-byte variable to a certain specified
   value. The format of a LOAD instruction is as follows:

   LOAD (%address, %value)

   The first parameter specifies the starting address of the 2-byte
   variable, whilst the second parameter specifies the value to be
   loaded into this variable. As usual, MSBs are stored before LSBs in
   the UDVM memory.

9.3.2.  MULTILOAD

   The MULTILOAD instruction sets a contiguous block of 2-byte variables
   to specified values.

   MULTILOAD (%address, #n, %value_0, ..., %value_n-1)

   The first parameter specifies the starting address of the contiguous
   variables, whilst the parameters value_0 through to value_n-1 specify
   the values to load into these variables (in the same order as they
   appear in the instruction).







Price, Hannu, et al.                                           [Page 32]


INTERNET-DRAFT                  SigComp              February 14 , 2002


9.3.3.  WORKING-MEMORY

   The WORKING-MEMORY instruction is used to prevent part of the UDVM
   memory from being modified. This can be very useful when offering
   UDVM code for compilation.

   WORKING-MEMORY (%memory_start, %memory_end)

   The parameters memory_start and memory_end specify the new working
   memory area for the UDVM. These parameters replace the application-
   defined parameters working_memory_start and working_memory_end, but
   only while the current message is being decompressed. When a new
   instance of the UDVM is invoked the working memory area is set by the
   original application-defined parameters.

   If memory_end < memory_start, or if the parameters reference a memory
   address beyond the overall UDVM memory size, then decompression
   failure occurs.

   After the WORKING-MEMORY instruction has been encountered, the only
   way to write into UDVM memory within the protected region is to
   cancel the protection using another WORKING-MEMORY instruction (or to
   invoke a new instance of the UDVM).

9.3.4.  COPY

   The COPY instruction is used to copy a string of bytes from one part
   of the UDVM memory to another.

   COPY (%position, %length, %destination)

   The position parameter specifies the memory address of the first byte
   in the string to be copied, and the length parameter specifies the
   number of bytes to be copied.

   The destination parameter gives the address to which the first byte
   in the string will be copied.

   Note that byte copying is performed as per the rules of Section 7.4.

9.3.5.  COPY-LITERAL

   A modified version of the COPY instruction is given below:

   COPY-LITERAL (%position, %length, $destination)

   The COPY-LITERAL instruction behaves as a COPY instruction except
   that after copying, the destination parameter is replaced with the
   memory address immediately following the address to which the final
   byte was copied. If the final byte was copied to the memory address





Price, Hannu, et al.                                           [Page 33]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   specified in byte_copy_right, the destination parameter is set to the
   memory address specified in byte_copy_left.

9.3.6.  COPY-OFFSET

   A further version of the COPY-LITERAL instruction is given below:

   COPY-OFFSET (%offset, %length, $destination)

   The COPY-OFFSET instruction behaves as a COPY-LITERAL instruction
   except that an offset parameter is given instead of a position
   parameter.

   To derive a suitable position parameter, starting at the memory
   address specified by destination, the UDVM counts backwards a total
   of offset memory addresses. If the memory address specified in
   byte_copy_left is reached, the next memory address is taken to be
   byte_copy_right.

   The COPY-OFFSET instruction then behaves as a COPY-LITERAL
   instruction, taking the position parameter to be the last memory
   address reached in the above step.

9.4.  Program flow instructions

   The following instructions alter the flow of UDVM code. Each
   instruction jumps to one of a number of memory addresses based on a
   certain specified criterion. Note that all of the instructions give
   the memory addresses in the form of deltas relative to the memory
   address of the instruction. The actual memory address is calculated
   as follows:

   memory_address = (memory_address_of_instruction + delta) modulo 2^16

   Note that certain I/O instructions (see Section 9.5) can also alter
   program flow.

9.4.1.  JUMP

   The JUMP instruction moves program execution to the specified memory
   address.

   JUMP (%delta)

   Note that if the address (specified as a delta from the address of
   the JUMP instruction) lies beyond the overall UDVM memory size then
   decompression failure occurs.








Price, Hannu, et al.                                           [Page 34]


INTERNET-DRAFT                  SigComp              February 14 , 2002


9.4.2.  COMPARE

   The COMPARE instruction compares two parameters and then jumps to one
   of three specified memory addresses depending on the result.

   COMPARE (%parameter_1, %parameter_2, %delta_1, %delta_2, %delta_3)

   If parameter_1 < parameter_2 then the UDVM continues instruction
   execution at the (relative) memory address specified by delta 1. If
   parameter_1 = parameter_2 then it jumps to the address specified by
   delta_2. If parameter_1 > parameter_2 then it jumps to the address
   specified by delta_3.

9.4.3.  CALL and RETURN

   The CALL and RETURN instructions provide support for compression
   algorithms with a nested structure.

   CALL (%delta)

   RETURN

   The CALL and RETURN instructions make use of a stack of 2-byte
   variables stored at the memory address specified by the well-known
   variable stack_location. The stack contains the following variables:

   Name:           Starting memory address:

   stack_free            stack_location
   stack[0]              stack_location + 2
   stack[1]              stack_location + 4
   stack[2]              stack_location + 6
      :                       :

   The MSBs of these variables are stored before the LSBs in the UDVM
   memory.

   When the UDVM reaches a CALL instruction, it finds the memory address
   of the instruction immediately following the CALL instruction and
   copies this 2-byte value into stack[stack_free] ready for later
   retrieval. It then increases stack_free by 1 and continues
   instruction execution at the (relative) memory address specified by
   the parameter.

   When the UDVM reaches a RETURN instruction it decreases stack_free by
   1, and then continues instruction execution at the byte position
   stored in stack[stack_free].

   If the variable stack_free is ever increased beyond 65535 or
   decreased below 0 then a bad compressed message has been received and
   decompression failure occurs (see Section 8.3).




Price, Hannu, et al.                                           [Page 35]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   Decompression failure also occurs if one of the above instructions is
   encountered and the value of stack_location is smaller than 6 (this
   prevents the stack from overwriting the well-known variables).

9.4.4.  SWITCH

   The SWITCH instruction performs a conditional jump based on the value
   of one of its parameters.

   SWITCH (#n, %j, %delta_0, %delta_1, ... , %delta_n-1)

   When a SWITCH instruction is encountered the UDVM reads the value of
   j. It then continues instruction execution at the (relative) address
   specified by delta j.

   If j specifies a value of n or more, a bad compressed message has
   been received and decompression failure occurs.

9.4.5.  CRC

   The CRC instruction verifies a string of bytes using a 2-byte CRC.

   CRC (%value, %position, %length, %delta)

   The actual CRC calculation is performed using the generator
   polynomial x^16 + x^12 + x^5 + 1, which coincides with the 2-byte
   Frame Check Sequence (FCS) of [RFC-1662].

   The position and length parameters define the string of bytes over
   which the CRC is evaluated. Byte copying rules are enforced as per
   Section 7.4.

   Important note: Since a CRC calculation is always performed over a
   bitstream, for interoperability it is necessary to define the order
   in which bits are supplied within each individual byte. In this case
   the MSBs of the byte MUST be supplied to the CRC calculation before
   the LSBs.

   The value parameter contains the expected integer value of the 2-byte
   CRC. If the calculated CRC matches the expected value then the UDVM
   continues at the following instruction. Otherwise the UDVM jumps to
   the (relative) memory address specified by delta.

9.5.  I/O instructions

   The following instructions allow the UDVM to interface with its
   environment. Note that in the overall SigComp architecture all of
   these interfaces pass to the decompressor dispatcher or to the state
   handler.






Price, Hannu, et al.                                           [Page 36]


INTERNET-DRAFT                  SigComp              February 14 , 2002


9.5.1.  END-MESSAGE

   The END-MESSAGE instruction successfully terminates the UDVM and
   passes state information to the state handler.

   END-MESSAGE (%hash_length, %state_start, %state_length,
   %state_instruction, %capability_announcement_location)

   The actions taken by the UDVM upon encountering the END-MESSAGE
   instruction are described in Section 8.2. Note also that the
   capability_announcement_location parameter points to the starting
   memory address of the capability announcement block of Section 6.3.

9.5.2.  DECOMPRESSION-FAILURE

   The DECOMPRESSION-FAILURE instruction triggers a manual decompression
   failure. This is useful if the UDVM program discovers that it cannot
   successfully decompress the message (e.g. by using the CRC
   instruction).

   This instruction has no parameters.

9.5.3.  OUTPUT

   The OUTPUT instruction provides successfully decompressed data to the
   dispatcher.

   OUTPUT (%output_start, %output_length)

   The parameters define the starting memory address and length of the
   byte string to be provided to the dispatcher. Note that the OUTPUT
   instruction can be used to output a partially decompressed message;
   each time the instruction is encountered it appends a byte string to
   the end of the data previously passed to the dispatcher via the
   OUTPUT instruction.

   The string of data is byte copied from the UDVM memory obeying the
   rules of Section 7.4.

   Decompression failure occurs if the cumulative number of bytes
   provided to the dispatcher exceeds the application-defined parameter
   maximum_uncompressed_size.

   Since there is technically a difference between outputting a 0-byte
   decompressed message, and not outputting a decompressed message at
   all, the OUTPUT instruction needs to distinguish between the two
   cases. Thus, if the UDVM terminates before encountering an OUTPUT
   instruction it is considered not to have outputted a decompressed
   message. If it encounters one or more OUTPUT instructions, each of
   which provides 0 bytes of data to the dispatcher, then it is
   considered to have outputted a 0-byte decompressed message.




Price, Hannu, et al.                                           [Page 37]


INTERNET-DRAFT                  SigComp              February 14 , 2002


9.5.4.  NBO

   The NBO instruction modifies the order in which compressed bits are
   passed to the UDVM.

   As the INPUT-FIXED and INPUT-HUFFMAN instructions read individual
   bits from within a byte, to avoid ambiguity it is necessary to define
   the order in which these bits are read. The default operation is to
   read the MSBs before the LSBs, but if the NBO instruction is
   encountered then the LSBs are read before the MSBs. Both cases are
   illustrated below:

    MSB         LSB MSB         LSB     MSB         LSB MSB         LSB

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0 1 2 3 4 5 6 7|8 9 ...        |   |7 6 5 4 3 2 1 0|        ... 9 8|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Byte 0          Byte 1              Byte 0          Byte 1

           Default operation            After NBO instruction

   The NBO instruction can only be used before bitwise compressed data
   is passed to the UDVM. Therefore, a decompression failure occurs if
   it is encountered after an INPUT-FIXED or an INPUT-HUFFMAN
   instruction has been used.

9.5.5.  INPUT-BYTECODE

   The INPUT-BYTECODE instruction requests a certain number of bytes of
   compressed data from the dispatcher.

   INPUT-BYTECODE (%length, %destination, %delta)

   The length parameter indicates the requested number of bytes of
   compressed data, and the destination parameter specifies the starting
   memory address to which they should be copied. Byte copying is
   performed as per the rules of Section 7.4.

   If the instruction requests data that lies beyond the end of the
   compressed message, no data is returned. Instead the UDVM moves
   program execution to the memory address specified by the formula
   (memory_address_of_INPUT-BYTECODE_instruction + delta) modulo 2^16.

   The INPUT-BYTECODE instruction can only be used before bitwise
   compressed data is passed to the UDVM. Therefore, a decompression
   failure occurs if it is encountered after an INPUT-FIXED or an INPUT-
   HUFFMAN instruction has been used.







Price, Hannu, et al.                                           [Page 38]


INTERNET-DRAFT                  SigComp              February 14 , 2002


9.5.6.  INPUT-FIXED

   The INPUT-FIXED instruction requests a certain number of bits of
   compressed data from the dispatcher.

   INPUT-FIXED (%length, %destination, %delta)

   The length parameter indicates the requested number of bits. If this
   parameter does not lie between 1 and 16 inclusive then a
   decompression failure occurs.

   The destination parameter specifies the memory address to which the
   compressed data should be copied. Note that the requested bits are
   interpreted as a 2-byte integer ranging from 0 to 2^length - 1. Under
   default operation the MSBs of this integer are provided first, but if
   an NBO instruction has been executed then the LSBs are provided
   first.

   If the instruction requests data that lies beyond the end of the
   compressed message, no data is returned. Instead the UDVM moves
   program execution to the memory address specified by the formula
   (memory_address_of_INPUT-FIXED_instruction + delta) modulo 2^16.


9.5.7.  INPUT-HUFFMAN

   The INPUT-HUFFMAN instruction requests a variable number of bits of
   compressed data from the dispatcher. The instruction initially
   requests a small number of bits and compares the result against a
   certain criterion; if the criterion is not met then additional bits
   are requested until the criterion is achieved.

   The INPUT-HUFFMAN instruction is followed by three mandatory
   parameters plus n additional sets of parameters. Every additional set
   contains four parameters as shown below:

   INPUT-HUFFMAN (%destination, %delta, #n, %bits_1, %lower_bound_1,
   %upper_bound_1, %uncompressed_1, ... , %bits_n, %lower_bound_n,
   %upper_bound_n, %uncompressed_n)

   Note that if n = 0 then the INPUT-HUFFMAN instruction is ignored by
   the UDVM. If bits_1 = 0 or (bits_1 + ... + bits_n) > 16 then
   decompression failure occurs.

   In all other cases, the behavior of the INPUT-HUFFMAN instruction is
   defined below:

   1.)   Set j = 1.

   2.)   Request an additional bits_j compressed bits. Interpret the
   total (bits_1 + ... + bits_j) bits of compressed data requested so




Price, Hannu, et al.                                           [Page 39]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   far as an integer H, with the first bit to be supplied as the MSB and
   the last bit to be supplied as the LSB (note that this is always the
   case, independently of whether the NBO instruction has been used).

   3.)   If data is requested that lies beyond the end of the compressed
   message, terminate the INPUT-HUFFMAN instruction and move program
   execution to the memory address specified by the formula
   (memory_address_of_INPUT-HUFFMAN_instruction + delta) modulo 2^16.

   4.)   If (H < lower_bound_j) or (H > upper_bound_j) then set j = j +
   1. Then go back to Step 2, unless j > n in which case decompression
   failure occurs.

   5.)   Copy (H + uncompressed_j - lower_bound_j) modulo 2^16 to the
   memory address specified by the destination parameter.

9.5.8.  STATE-REFERENCE

   The STATE-REFERENCE instruction retrieves some previously stored
   state information.

   STATE-REFERENCE (%id_start, %id_length, %state_start, %state_length,
   %state_destination)

   The id_start and id_length parameters specify the location of the
   state identifier used to retrieve the state information. The state
   identifier is always 16 bytes long; if id_length is less than 16 then
   the remaining least significant bytes of the identifier are padded
   with zeroes.

   Decompression failure occurs if id_length is greater than 16.
   Decompression failure also occurs if no state information matching
   the state identifier can be found.

   Note that when accessing state information that has been previously
   created by the UDVM, the state identifier is always taken from an
   [MD5] hash of the state to be retrieved. However this is not
   necessarily the case for application-defined state as per Section
   3.2.

   The state_start and state_length parameters define the starting byte
   and number of bytes to copy from the state_value contained in the
   identified item of state. If more state is requested than is actually
   available then decompression failure occurs.

   The state_destination parameter contains a UDVM memory address. The
   requested state is byte copied to this memory address using the rules
   of Section 7.4.







Price, Hannu, et al.                                           [Page 40]


INTERNET-DRAFT                  SigComp              February 14 , 2002


9.5.9.  STATE-EXECUTE

   The STATE-EXECUTE instruction retrieves and runs some previously
   stored state information.

   STATE-EXECUTE (%id_start, %id_length)

   The id_start and id_length parameters function as per the STATE-
   REFERENCE instruction.

   STATE-EXECUTE is similar to STATE-REQUEST except that it does not
   require the amount of state being requested or the proposed
   destination for the state to be specified explicitly. Instead, it
   simply puts the state_value back into the UDVM memory using the
   parameters state_start and state_length contained as part of the
   state information.

   The entire state_value (all state length bytes of it) is byte copied
   into the memory address specified by state start. The UDVM then jumps
   to the (absolute) memory address specified by state_instruction.

   Note that state start, state length and state_instruction are all
   stored together with state_value as part of an item of state
   information.


10. Security considerations

10.1 Security goals

   The overall security goal of the SigComp architecture is to not
   create risks that are in addition to those already present in the
   application protocols. There is no intention for SigComp to enhance
   the security of the protocols, as it always can be circumvented by
   not using compression. More specifically, the high-level security
   goals can be described as:

   -- do not worsen security of existing application protocol

   -- do not create any new security issues

   -- do not hinder deployment of application security


10.2 Security risks and mitigations

   This subsection identifies the potential security risks associated
   with the overall SigComp architecture, and details the proposed
   solution for each risk.






Price, Hannu, et al.                                           [Page 41]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   ** Confidentiality risks

   *** Attacking SigComp by snooping into state of other users

   State can only be accessed using a state identifier, which is a
   (prefix of a) cryptographic hash of the state being referenced. This
   implies that the referencing packet already needs knowledge about the
   state. To enforce this, a reference length of 48 bits is defined.
   This also minimizes the probability of an accidental state collision.

   Generally, ways to obtain knowledge about the state identifier (e.g.,
   passive attacks) will also easily provide knowledge about the state
   referenced, so no new vulnerability results.

   The application needs to handle state identifiers with the same care
   it would handle the state itself.

   ** Integrity risks

   The SigComp approach assumes that there is appropriate integrity
   protection below and/or above the SigComp layer. However, the state
   establishment mechanism provides additional potential to compromise
   the integrity of the messages (which, however, would most likely be
   detectable at the application layer).

   *** Attacking SigComp by faking state or making unauthorized changes
   to state

   State cannot be destroyed or changed by a malicious sender -- it can
   only add new state. Faking state is only possible if the hash allows
   intentional collision.

   ** Availability risks (avoid DoS vulnerabilities)

   *** Use of SigComp as a tool in a DoS attack to another target

   SigComp cannot easily be used as an amplifier in a reflection attack,
   as it only generates one decompressed message per incoming compressed
   message. This packet is then handed to the application; the utility
   as a reflection amplifier is therefore limited by the utility of the
   application.

   However, it must be noted that SigComp can be used to generate larger
   packets as input to the application than have to be sent from the
   malicious sender; this therefore can send smaller packets (at a lower
   bandwidth) than are delivered to the application. Depending on the
   reflection characteristics of the application, this can be considered
   a mild form of amplification. The application MUST limit the number
   of packets reflected to a potential target -- even if SigComp is used
   to generate a large amount of information from a small incoming
   attack packet.




Price, Hannu, et al.                                           [Page 42]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   *** Attacking SigComp as the DoS target by filling it with state

   Excessive state can only be installed by a malicious sender (or a set
   of malicious senders) with the consent of the application. The system
   consisting of SigComp and application is thus approximately as
   vulnerable as the application itself, unless it allows the
   installation of state from a message where it would not have
   installed state itself.

   If this is desirable to increase the compression ratio, the effect
   can be mitigated by adding feedback at the application level that
   indicates whether the state requested was actually installed -- This
   allows a system under attack to gracefully degrade by no longer
   installing compressor state that is not matched by application state.

   *** Attacking the UDVM by faking state or making unauthorized changes
   to state

    (See "Integrity risks" above.)

   *** Attacking the UDVM by sending it looping code

   The application sets an upper limit to the number of "CPU cycles"
   that can be used per compressed message and per input bit in the
   compressed message. The damage inflicted by sending packets with
   looping code is therefore limited, although this may still be
   substantial if a large number of CPU cycles are offered by the UDVM.
   However, this would be true for any decompressor that can receive
   packets from anywhere.


11. IANA considerations

   The SigComp solution currently requires two identifiers to be
   assigned by IANA: the UDVM_version and the state identifier.

   Upgraded versions of the UDVM will contain additional instructions to
   improve the performance of the overall SigComp solution; new
   UDVM_version parameters will be needed in this case.

   Well-known decompression algorithms will also need to be assigned
   fixed state identifiers.


12. Acknowledgements

   Thanks to

            Abigail Surtees (abigail.surtees@roke.co.uk)
            Mark A West (mark.a.west@roke.co.uk)
            Lawrence Conroy (lwc@roke.co.uk)




Price, Hannu, et al.                                           [Page 43]


INTERNET-DRAFT                  SigComp              February 14 , 2002


            Christian Schmidt (christian.schmidt@icn.siemens.de)
            Max Riegel (maximilian.riegel@icn.siemens.de)
            Lars-Erik Jonsson (lars-erik.jonsson@epl.ericsson.se)
            Stefan Forsgren (stefan.forsgren@epl.ericsson.se)
            Krister Svanbro (krister.svanbro@epl.ericsson.se)
            Christopher Clanton (christopher.clanton@nokia.com)
            Khiem Le (khiem.le@nokia.com)
            Ka Cheong Leung (kacheong.leung@nokia.com)

   for valuable input and review.


13. AuthorsÆ addresses

   Richard Price         Tel: +44 1794 833681
   Email:                richard.price@roke.co.uk

   Roke Manor Research Ltd
   Romsey, Hants, SO51 0ZN
   United Kingdom


   Hans Hannu            Tel: +46 920 20 21 84
   Email:                hans.hannu@epl.ericsson.se

   Box 920
   Ericsson Erisoft AB
   SE-971 28 Lulea, Sweden


   Carsten Bormann       Tel: +49 421 218 7024
   Email:                cabo@tzi.org

   Universitaet Bremen TZI
   Postfach 330440
   D-28334 Bremen, Germany


   Jan Christoffersson   Tel: +46 920 20 28 40
   Email:                jan.christoffersson@epl.ericsson.se

   Box 920
   Ericsson Erisoft AB
   SE-971 28 Lulea, Sweden


   Zhigang Liu           Tel: +1 972 894-5935
   Email:                zhigang.liu@nokia.com

   Nokia Research Center
   6000 Connection Drive




Price, Hannu, et al.                                           [Page 44]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   Irving, TX 75039
   USA


   Jonathan Rosenberg
   Email:                jdrosen@dynamicsoft.com

   dynamicsoft
   72 Eagle Rock Avenue
   First Floor
   East Hanover, NJ 07936


14. References

   [SIP]       "SIP: Session Initiation Protocol", Handley et al,
               RFC 2543, Internet Engineering Task Force, March 1999

   [RTSP]      "Real Time Streaming Protocol (RTSP)", H. Schulzrinne, A.
               Rao and R. Lanphier, , RFC 2326, April 1998

   [HTTP]      "HyperText Transfer Protocol, HTTP/1.1", R. Fielding et
               al.", RFC 2616, June 1999

   [SIPsrv]    "SIP: Locating SIP Servers", J. Rosenberg, H.
               Schulzrinne, draft-ietf-sip-srv-04.txt, January 2002,
               work in progress

   [DEFLATE]   "DEFLATE Compressed Data Format Specification version
               1.3", P. Deutsch, RFC 1951, Internet Engineering Task
               Force, May 1996

   [SCTP]      "Stream Control Transmission Protocol", Stewart et al,
               RFC 2960, Internet Engineering Task Force, October 2000

   [MD5]       "The MD5 Message-Digest Algorithm", R. Rivest, RFC 1321,
               Internet Engineering Task Force, April 1992

   [RFC-1662]  "PPP in HDLC-like Framing", Simpson et al, Internet
               Engineering Task Force, July 1994

   [RFC-2026]  "The Internet Standards Process - Revision 3", Scott
               Bradner, Internet Engineering Task Force, October 1996

   [RFC-2119]  "Key words for use in RFCs to Indicate Requirement
               Levels", Scott Bradner, Internet Engineering Task Force,
               March 1997








Price, Hannu, et al.                                           [Page 45]


INTERNET-DRAFT                  SigComp              February 14 , 2002


Appendix A. Mnemonic language

   Writing UDVM programs directly in bytecode would be a daunting task,
   so a simple mnemonic language is provided to facilitate the creation
   of new decompression algorithms. Most importantly, the language
   allows the parameters of an instruction to be specified as text names
   rather than as integer values.

   If an instruction parameter is given as a text name, it should
   correspond to exactly one instance of a label, a reserved memory
   address or an externally defined keyword. A label is simply a text
   name preceded by a colon, for example:

   :loop
   JUMP (loop)

   For any parameters corresponding to a label, the integer value of the
   parameter is calculated by the following formula:

    parameter_value = (instruction_address - label_address) modulo 2^16

   Note that the "label address" is simply the memory address of the
   instruction immediately following the label. In particular, the above
   example can be rewritten as JUMP (0).

   A reserved memory address is specified using the "reserve" keyword
   followed by a text_name and (optionally) an integer value. For
   example:

   reserve apples
   reserve pears (8)
   reserve bananas
   LOAD (bananas, 5)

   For any parameters corresponding to a reserved memory address, the
   integer value of the parameter is the next free memory address that
   has not yet been reserved. Starting at this address, the specified
   number of bytes of memory are then reserved (if no value is given
   then a total of 2 bytes is reserved).

   The first instance of a "reserve" keyword begins reserving memory at
   Address 6 (to avoid overwriting the three well-known variables of
   Section 7.2). So the above example can be rewritten as LOAD (16, 5).

   An externally defined keyword is specified outside of the mnemonic
   language. All of the application-defined parameters are considered to
   be externally defined keywords and can be referenced in the mnemonic
   code (useful for adapting the code based on the available memory or
   CPU cycles). The following additional keywords can also be used:






Price, Hannu, et al.                                           [Page 46]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   Keyword:                 Corresponding value:

   byte_copy_left                 0
   byte_copy_right                2
   stack_location                 4
   reserved_end               See below
   bytecode_length            See below
   total_length               See below

   The keyword reserved_end specifies the highest reserved memory
   address for the entire mnemonic code (taking into account all the
   occasions where memory is reserved).

   The keyword bytecode_length specifies the total size of the bytecode
   corresponding to the mnemonic code. Any instances of bytecode_length
   are initially replaced with 3 bytes of zeroes, and then are filled in
   after the remainder of the bytecode has been generated.

   Similarly, the keyword total_length specifies the total amount of
   memory required at the UDVM including bytecode and reserved memory
   addresses.

   A complete description of the mnemonic language and how it should be
   translated into bytecode is given below:

   Instructions:     Instruction names are given in capitals. Replace
                     each name with the corresponding 1-byte value as
                     per Chapter 9.

   $:                When appended to the front of an instruction
                     parameter then the parameter is a memory address
                     rather than a direct value. This symbol is
                     mandatory for reference parameters, optional for
                     multitype parameters and disallowed for literals.

   Integers:         Instruction parameters can be given in the form of
                     decimal integers. They are converted into the
                     shortest bytecode capable of representing the
                     integer by the rules of Section 7.3.

   Text references:  Instruction parameters can also be given in the
                     form of lowercase names. These names should match
                     exactly one label, reserved memory address or
                     externally defined keyword as described above.

   Labels:           Label names are given as a colon followed by
                     lowercase text. They are deleted when converting
                     the mnemonics to bytecode.

   Reserved memory:  Memory addresses are reserved using the "reserve"
                     keyword. The line containing the reserve keyword




Price, Hannu, et al.                                           [Page 47]


INTERNET-DRAFT                  SigComp              February 14 , 2002


                     is deleted when converting to bytecode.

   .LSB:             When appended to the end of a text name, the
                     integer value corresponding to the name is
                     increased by 1. This is useful for addressing the
                     LSBs of a 2-byte variable.

   0b, 0d:           Bytecode values can be specified directly in
                     binary or decimal via the appropriate prefix. The
                     direct bytecode continues until a character occurs
                     that is not an integer or whitespace.

   Whitespace:       All whitespace (plus brackets and commas) just
                     delimit the instructions. Delete.

   Comments:         These are indicated by a semicolon and continue
                     to the end of the line. Delete.

   Once the mnemonic code has been converted into bytecode, it can be
   executed by copying the bytecode into the UDVM memory beginning at
   the first memory address that has not been reserved by an instance of
   the "reserve" keyword. Program execution is assumed to begin at this
   address.

   Note that further to the rules outlined above, well-written mnemonic
   code will also have the following properties:

   *    Any instance of a memory address will be specified as a text
        reference rather than an integer value. This ensures that the
        mnemonic code is portable.

   *    The mnemonic code will not write to any memory address except
        those reserved by the "reserve" keyword. This ensures that the
        code can be compiled.


Appendix B. Example application-defined parameters

   This appendix gives some example values for each of the application-
   defined parameters. These values are geared towards the compression
   of a signaling protocol such as [SIP].

   Note that all of the proposed values are fixed and not negotiated
   between the two instances of the application invoking SigComp. This
   is because it is possible for the application invoking the
   decompressor to receive compressed messages from several different
   applications, and it is difficult to determine which message
   corresponds to which application. [SIP] does this using "From:" and
   "To:" fields in the message itself, but these are not visible until
   the message has been decompressed. It is simpler just to fix a set of
   parameter for every instance of the application.




Price, Hannu, et al.                                           [Page 48]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   UDVM_version                    0
   minimum_compression_ratio       0.5
   maximum_compressed_size         65535
   maximum_uncompressed_size       65535
   minimum_hash_size               6
   overall_memory_size             8192
   working_memory_start            0
   working_memory_end              8191
   cycles_per_bit                  20
   cycles_per_message              2000
   first_instruction               6

   Note that the parameters overall_memory_size, cycles_per_bit and
   cycles_per_message can be increased on the fly using the capabilities
   announcement mechanism. This mechanism is designed to function
   correctly even when the receiving application is sent compressed
   messages from several different applications.

   The initial contents of the UDVM memory also need to be defined. It
   is not enough simply to initialize the memory containing all zeroes,
   as the UDVM would be unable to input any compressed data. Instead,
   for each new compressed message the memory should be initialized
   containing a simple decompressor capable of extracting the first few
   bytes of compressed data. These bytes can then be interpreted as a
   state identifier to retrieve the correct decompression algorithm.

   As an example, the following mnemonic code can be converted to
   bytecode and pasted into the UDVM memory beginning at Address 6:

   reserve state_identifier (6)
   INPUT-BYTECODE (6, state_identifier, fail)
   STATE-EXECUTE (state_identifier, 6)
   :fail
   DECOMPRESSION-FAILURE

   Finally, the application can define initial state that is available
   to the UDVM. Examples of application-defined state include common
   decompression algorithms, dictionaries of common text phrases etc.


Appendix C. Example decompression algorithms

   This appendix gives examples of decompression algorithms which can be
   run on the UDVM in the form of bytecode.

C.1.  Example UDVM code for simple LZ77 decompression

   The first example gives the code required to decompress data from a
   very simple LZ77-based algorithm. The UDVM is instructed to interpret
   a compressed message as a set of 4-byte characters, where each
   character contains a 2-byte position integer followed by a 2-byte




Price, Hannu, et al.                                           [Page 49]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   length integer. Taken together these integers point to a previously
   received text string in the UDVM memory, which is then copied to the
   end of the uncompressed message.

   Since the compressor can only send references to strings already
   present in the UDVM memory, before the first message is decompressed
   the memory must be initialized with a static dictionary containing
   the 256 ASCII characters.

   The algorithm write-protects the memory containing the UDVM
   instructions used to decompress each character, so that they can
   easily be compiled to improve the speed of decompression.

   A 2-byte CRC over the uncompressed message is appended to the end of
   the compressed message, to verify that correct decompression has
   occurred. The algorithm also requests that the contents of the UDVM
   memory be saved using the state request mechanism, so that it can be
   retrieved by sending the appropriate 6-byte hash.

   reserve byte_copy_left
   reserve byte_copy_right
   reserve uncompressed_start
   reserve uncompressed_end
   reserve uncompressed_length
   reserve position
   reserve length
   reserve static_dictionary (256)
   reserve circular_buffer (2048)

   WORKING-MEMORY (uncompressed_start, reserved_end)
   MULTILOAD (0, 7, circular_buffer, reserved_end, static_dictionary,
   circular_buffer, 0, 0, 0)

   :unpack_static_dictionary

   ; The following instructions initialize the static dictionary.

   COPY-LITERAL (position.LSB, 1, $uncompressed_start)
   ADD ($position, 1)
   COMPARE ($position, 256, unpack_static_dictionary, next_character, 0)

   :next_character

   INPUT-FIXED (16, position, fail)
   INPUT-FIXED (16, length, end_of_message)
   COPY-LITERAL ($position, $length, $uncompressed_end)
   ADD ($uncompressed_length, $length)
   JUMP (next_character)

   :fail





Price, Hannu, et al.                                           [Page 50]


INTERNET-DRAFT                  SigComp              February 14 , 2002


   DECOMPRESSION-FAILURE

   :end_of_message

   CRC ($position, $uncompressed_start, $uncompressed_length, fail)
   OUTPUT ($uncompressed_start, $uncompressed_length)
   END-MESSAGE (6, 0, total_length, next_character, 0)


Appendix D. Document history

   - October 19, 2001, version 00

   First version. The draft describes the current ideas, from people
   involved in the ROHC WG, of how to perform compression of
   application signaling messages.

   - October 31, 2001, version 01

   Second version. Additional section, 5.2.1, which describes when a
   message identifier can be reused.

   - November 21, 2001, version 02

   Third version. Section 6 has been moved to a separate draft. The
   third version describes a modular solution, providing flexibility
   for implementers to decide which functions they want to integrate.

   - January 28, 2002, version 03

   Fourth version. SigComp version 02 is divided into this draft, a UDVM
   draft and a extended operation mechanisms draft.
   Compressor/decompressor (UDVM) state approach has been introduced for
   security reasons.

   - February 14, 2002, version 04

   Fifth version. Describes the complete base SigComp solution including
   the UDVM.











   This Internet-Draft expires in August 2002.




Price, Hannu, et al.                                           [Page 51]