Network Working Group                  Richard Price, Siemens/Roke Manor
INTERNET-DRAFT                        Robert Hancock, Siemens/Roke Manor
Expires: May 2001                     Stephen McCann, Siemens/Roke Manor
                                         Mark A West, Siemens/Roke Manor
                                     Abigail Surtees, Siemens/Roke Manor
                                              Christian Schmidt, Siemens

                                                       November 17, 2000

           Efficient Protocol Independent Compression (EPIC)
                      <draft-price-rohc-epic-00.txt>

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of [RFC-2026].

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This document is a submission to the IETF ROHC WG. Comments should be
   directed to the mailing list of ROHC, rohc@cdt.luth.se.


Abstract

   The RObust Header Compression [ROHC5] scheme is designed to compress
   packet headers over error prone channels. It is structured into two
   main parts. Firstly there is an extensible core framework covering
   the basic ROHC protocol and packet formats, which is instantiated for
   a particular protocol stack by means of a ROHC profile. Secondly
   [ROHC5] also includes a number of specific profiles for particular
   protocol stacks (RTP/UDP/IP etc.).

   The Efficient Protocol Independent Compression (EPIC) scheme is a
   general mechanism for providing additional profiles for use within
   the ROHC framework. It can produce a set of compressed header formats
   for an arbitrary protocol stack and a chosen level of robustness. As
   with [ROHC5], the decompressed headers are semantically identical to
   the original uncompressed headers. Moreover the headers generated by
   EPIC have the special property that, in a well defined sense, they
   are optimally efficient.

Price et al.                                                  [PAGE 1]


INTERNET-DRAFT                   EPIC                November 17, 2000

Table of contents

   1.  Introduction.................................................3
   2.  Terminology..................................................4
   3.  Existing Solutions...........................................4
   3.1.  ROHC framework.............................................4
   3.2.  Location of EPIC in ROHC...................................4
   3.3. ROHC profiles...............................................5
   4.  Header compression using EPIC................................5
   4.1.  Compression and decompression procedures...................5
   4.2.  Huffman compression and EPIC inputsets.....................7
   5.  EPIC Inputset................................................9
   5.1.  General parameters.........................................9
   5.1.1.  Lookup tables............................................9
   5.1.2.  Control of header alignment..............................9
   5.1.3.  Available memory.........................................10
   5.2.  Field-specific parameters..................................11
   6.  EPIC Encoding Toolbox........................................12
   6.1.  Toolbox members............................................13
   6.1.1.  STATIC...................................................13
   6.1.2.  IRREGULAR(k).............................................13
   6.1.3.  VALUE(v).................................................13
   6.1.4.  READ(LTABLE,i)...........................................13
   6.1.5.  WRITE(k,LTABLE,i)........................................14
   6.1.6.  LSB(k,p).................................................14
   6.1.7.  UNCOMPRESSED.............................................14
   6.1.8.  INFERRED.................................................14
   6.2.  Hints on applying the toolbox..............................15
   6.2.1.  Compressing variable-length fields.......................15
   6.2.2.  Compressing extension headers............................15
   7.  Control Fields...............................................17
   7.1.  CRC........................................................17
   7.2.  Master Sequence Number.....................................18
   7.3.  Miscellaneous Information..................................18
   8.  Extensibility................................................19
   8.1.  Other protocol stacks......................................19
   8.2.  New toolbox methods........................................19
   8.3.  New control fields.........................................20
   8.4.  Learning version of EPIC...................................20
   9.  EPIC Processing..............................................21
   9.1.  Structure of the Huffman tree..............................21
   9.2.  Compressor operation.......................................22
   9.2.1.  Step 1: Compression front end............................23
   9.2.2.  Step 2: Compressing the uncompressed header..............23
   9.2.3.  Step 3: Adding the UNCOMPRESSED fields...................23
   9.3.  Decompressor operation.....................................23
   9.3.1.  Step 1: Decompressing the compressed header..............23
   9.3.2.  Step 2: Reconstructing the INFERRED fields...............24
   10.  Security considerations.....................................24
   11.  Acknowledgements............................................24
   12.  Intellectual Property Considerations........................24
   13.  References..................................................25
   14.  Authors' Addresses..........................................25
   Appendix A.  EPIC Description in Pseudocode......................26

Price et al.                                                  [PAGE 2]


INTERNET-DRAFT                   EPIC                November 17, 2000

   A.1.  Structure of the Hierarchical Huffman Tree.................26
   A.2.  Compressor.................................................27
   A.2.1.  Step 1: Compression front end............................27
   A.2.2.  Step 2: Compressing the uncompressed header..............28
   A.2.3.  Step 3: Adding the UNCOMPRESSED fields...................29
   A.3.  Decompressor...............................................29
   A.3.1.  Step 1: Decompressing the compressed header..............29
   A.3.2.  Step 2: Decompression front end..........................30
   A.3.3.  Step 3: Reconstructing the INFERRED fields...............31
   A.4.  Hierarchical Huffman tree construction.....................32
   A.4.1.  Step 1: Process the EPIC inputset........................33
   A.4.2.  Step 2: Resolve complex encoding methods.................33
   A.4.3.  Step 3: Tree pruning.....................................35
   A.4.4.  Step 4: Creating the HH Tree.............................35
   A.5.  EPIC implementation issues.................................38
   A.5.1.  Arithmetic...............................................38
   Appendix B.  Example Profile for RTP/UDP/IPv4....................40
   B.1.  General parameters.........................................40
   B.2.  Field-specific parameters..................................40
   B.3.  Changes required for a random IP ID........................49
   Appendix C: Proof of Optimal Efficiency..........................50

1.  Introduction

   This document describes a method for generating new compressed header
   formats for [ROHC5]. The Efficient Protocol Independent Compression
   (EPIC) scheme takes as its input a list of fields in the protocol
   stack to be compressed, and for each field a choice of compression
   techniques to be made available for the field. Using this input EPIC
   derives a set of compressed header formats stored in the form of a
   tree. These compressed header formats can then be used to quickly
   compress and decompress headers.

   An important property of EPIC is that for any chosen protocol it
   generates a set of compressed headers for which the average header
   size is provably minimal. In particular, for the same level of
   robustness the compressed headers produced by EPIC are 5 - 10% more
   efficient than those currently used in [ROHC5]. A subset of EPIC has
   been implemented in accordance with this draft to demonstrate the
   operation of the algorithms, and this is available from [IMPL].

   Chapter 3 gives an overview of [ROHC5] and explains how EPIC fits
   into the ROHC framework.
   Chapter 4 describes the steps involved in the EPIC scheme.
   Chapter 5 lists the inputset needed to create a new set of compressed
   header formats.
   Chapter 6 describes the eight basic compression techniques available
   in the EPIC toolbox.
   Chapter 7 discusses the EPIC control fields.
   Chapter 8 considers the extensibility of EPIC.
   Chapter 9 contains a step-by-step worked example of EPIC in action.
   Appendix A gives a normative description of EPIC in pseudocode.
   Appendix B lists an example EPIC inputset for RTP/UDP/IPv4.
   Appendix C states and proves the optimal efficiency claim for EPIC.

Price et al.                                                  [PAGE 3]


INTERNET-DRAFT                   EPIC                November 17, 2000

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC-2119].


3.  Existing Solutions

   This chapter describes the overall ROHC framework and the location of
   EPIC within [ROHC5].

3.1.  ROHC framework

   [ROHC5] defines a general compressed header format that is used for
   any compressed packet as well as feedback and IR (Initialization and
   Refresh) packets. The general header format includes a CID (Context
   Identifier) to indicate the packet stream to which the header
   belongs. It also includes a packet type indicator to specify whether
   the packet is a feedback, initialization or general compressed
   header, whether it is segmented, and whether it contains padding.

   The following packet type indicators are reserved in the overall
   [ROHC5] framework:

   1110:     Padding or Add-CID octet
   11110:    Feedback
   111110:   IR-DYNAMIC packet
   1111110:  IR packet
   1111111:  Segment

   The remaining packet types depend on the protocol stack to be
   compressed. A ROHC profile is the specification of how to compress a
   certain kind of header stream over a certain kind of link. Each ROHC
   profile has its own set of compressed packet formats to complement
   the general packet types common to all ROHC profiles.

   In general the compressed header formats for a profile are octet-
   aligned and do not start with the bit pattern 111XXXXX (since this is
   reserved by the overall ROHC framework).

   Note that as well as defining a new set of compressed header formats,
   each ROHC profile also describes how the compressor and decompressor
   should be initialized and when to change state or mode.

3.2  Location of EPIC in ROHC

   The EPIC scheme generates new compressed header formats for use in
   ROHC profiles.

   The packet formats generated by EPIC are fully [ROHC5]-compatible.
   The location of the EPIC compressed header within the general ROHC
   packet is shown below:


Price et al.                                                  [PAGE 4]


INTERNET-DRAFT                   EPIC                November 17, 2000

        0                           7
       --- --- --- --- --- --- --- ---
      |         Add-CID octet         | if for CID 1-15 and small CIDs
      +---+--- --- --- ---+--- --- ---+
      |    EPIC compressed header     | 1 octet
      +---+--- ---+---+---+--- --- ---+
      |                               |
      /    0, 1, or 2 octets of CID   / 1 or 2 octets if large CIDs
      |                               |
      +---+---+---+---+---+---+---+---+
      /    EPIC compressed header     / variable
      +---+---+---+---+---+---+---+---+

        Figure 1 : Location of EPIC within the general ROHC packet


   Note that to ensure ROHC compatibility, the first octet of an EPIC
   compressed header never uses the bit pattern 111XXXXX.

3.3.  ROHC profiles

   [ROHC5] currently contains four profiles:

   Profile 0 for sending uncompressed IP packets
   Profile 1 for RTP/UDP/IP compression
   Profile 2 for UDP/IP compression
   Profile 3 for ESP/IP compression

   The basic function of EPIC is to generate new profiles for [ROHC5].
   Each profile includes a set of compressed header formats that can
   efficiently compress one protocol stack. Within each profile, the
   size of the CRC checksum, length of the sequence number and memory
   requirements for the profile can also be customized.


4.  Header Compression using EPIC

   This chapter outlines the EPIC scheme.

4.1.  Compression and decompression procedures

   Any ROHC-compatible header compression scheme can be considered to
   operate as follows:
   1. For an input uncompressed header, extract the non-redundant
      information from the header fields using available context data as
      appropriate.
   2. Format this information into a compressed header (as in Figure 1)
      for transfer to the decompressor.
   3. Use this information, along with context data at the decompressor,
      to reconstruct the original header.
   This section describes in overview how each stage is carried out
   using the EPIC scheme.



Price et al.                                                  [PAGE 5]


INTERNET-DRAFT                   EPIC                November 17, 2000

   Figure 2a illustrates the per-packet processing which is done for
   each header to be compressed and decompressed.

    (1)   |                                                  ^
          | {Uncompressed                      {Uncompressed |
          |    header}                            header}    |
          v                                                  |
        +-----------------+                                  |
        |    Classify     |                                  |
        |     packet      |                                  |
        +-----------------+                                  |
          |                                                  |
          | {Header &                                        |
          |  format}                         +-----------------+
          v                                  | Apply decoding  |
        +-----------------+                  |   methods and   |
        | Apply encoding  |                  |   reconstruct   |
        |     methods     |                  |     header      |
        | (Section A.2.1) |                  | (Section A.3.2) |
        +-----------------+                  +-----------------+
          |                                                  ^
          | {Format &                              {Format & |
          |  M value}                               M value} |
          v                                                  |
    (2) +-----------------+                  +-----------------+
        |      Tree       |                  |      Tree       |
        |    traversal    |                  |    traversal    |
        | (Section A.2.2) |                  | (Section A.3.1) |
        +-----------------+                  +-----------------+
          |                                                  ^
          | {Codewords}                          {Codewords} |
          v                  {Compressed                     |
        +-----------------+    header}   (3) +-----------------+
        |     Header      |                  |     Header      |
        |  formatting &   | ---------------> |   reception &   |
        |  transmission   |                  |    parsing      |
        +-----------------+                  +-----------------+

                Figure 2a : EPIC compression/decompression

   For (1) EPIC carries out two principal steps. The first is to choose
   a "compressed header format" for the uncompressed header. This format
   defines how the non-redundant information will be extracted from each
   field in the uncompressed header. Note that the set of available
   formats depends on the chosen profile, and optionally on other
   factors such as the compressor/decompressor mode.

   For the second step of (1) the information is extracted from each
   field in turn and the results combined together into a single bit
   pattern, which can be thought of as a single large integer known as
   the "M value". The process used to do this extraction for any given
   field is called the "encoding method" for that field. The set of
   fundamental encoding methods is called the "EPIC toolbox", and is
   essentially the same as those used for the profiles defined in

Price et al.                                                  [PAGE 6]


INTERNET-DRAFT                   EPIC                November 17, 2000

   [ROHC5]. The capabilities of EPIC are extensible simply by adding to
   this toolbox (all other parts of the process are unaffected);
   extensibility issues are considered further in Chapter 8.

   The next step (2) involves taking the format and M value and
   constructing the compressed header itself. The header formats are
   stored using a tree, with each leaf node corresponding to exactly one
   compressed header format. EPIC follows the tree from leaf to root,
   using the M value to fill gaps in the compressed header where field
   information is to be stored.

   At the receiver, step (3) essentially carries out equivalent
   operations in reverse. The decompressor has an identical tree, and
   can use the EPIC compressed header to navigate a path through the
   tree starting from the root. This results in a known compressed
   header format and reconstructed M value. Since the set of encoding
   methods used is now known (from the format) the M value can be
   resolved into the non-redundant information from every field,
   reconstructing each field in the uncompressed header.

4.2.  Huffman compression and EPIC inputsets

   Huffman compression is a well known technique used in many popular
   compression schemes. EPIC uses a variant of Huffman encoding to
   produce a new set of compressed header formats for the compressor and
   decompressor. EPIC employs an optimization of the basic Huffman
   technique which reduces memory and processing requirements within the
   compressor/decompressor.

   Basic Huffman encoding is designed to compress an arbitrary stream of
   characters from a character set such as ASCII. The idea is to create
   a new set of compressed characters, where each normal character maps
   onto a compressed character and vice versa. Common characters are
   given shorter compressed equivalents than rarely used characters,
   reducing the average size of the data stream. The compression ratio
   can be improved by increasing the size of one character, but at the
   expense of higher memory requirements. In fact the memory required in
   a Huffman compressor grows exponentially with the character size so
   16-bit characters need 256 times as much memory as 8-bit characters.

   The fundamental idea of EPIC is to apply Huffman encoding to a stream
   of packet headers, where each header is treated as a single
   character. The advantage of this choice is that Huffman encoding is
   optimally efficient for a single character, so applying Huffman to
   the whole header at once means that the average compressed header
   size is provably the minimum possible. The disadvantage of basic
   Huffman is that the character set is extremely large (for a 40 octet
   protocol stack the character set contains up to 2^320 headers).
   Clearly the basic Huffman technique cannot be used in this way.

   The solution lies in a modified version of Huffman encoding known as
   Hierarchical Huffman (HH). The HH algorithm compresses a data stream
   with exactly the same level of efficiency as normal Huffman, but for
   a fraction of the processing and memory cost.

Price et al.                                                  [PAGE 7]


INTERNET-DRAFT                   EPIC                November 17, 2000

   The input required to build a new set of compressed header formats is
   known as an "EPIC inputset". Each inputset is essentially a
   structured description of the behavior of the protocol stack to be
   compressed (along with some control information). EPIC converts an
   inputset into a suitable form for the HH algorithm, which then builds
   a unique HH tree containing the header formats. Inputsets have a
   recursive structure, which makes it easy to build new inputsets out
   of existing ones (e.g. when adding new layers to a protocol stack).
   In advanced applications of EPIC new inputsets can even be
   transmitted in a simple "profiling" message, which allows dynamic
   updating of the header compression capabilities of a device.

   Figure 2b describes the process of building the HH tree from the EPIC
   inputset, which is done once only and the results stored at the
   compressor and decompressor. References are given to pseudocode in
   Appendix A which describes the various stages explicitly.

                          +-----------------------+
                          |    Input Stage 1:     |
                          |  Enter the inputset   |
                          |    which describes    |
                          |  the protocol stack   |
                          |    (Section A.4.1)    |
                          +-----------------------+
                                      |
                                      v
                          +-----------------------+
                          |    Input Stage 2:     |
                          |    Resolve complex    |
                          | encoding methods into |
                          |  fundamental methods  |
                          |    (Section A.4.2)    |
                          +-----------------------+
                                      |
                                      v
                          +-----------------------+
                          |    Input Stage 3:     |
                          | Prune low probability |
                          |    header formats     |
                          |    (Section A.4.3)    |
                          +-----------------------+
                                      |
                                      v
                          +-----------------------+
                          |    Input Stage 4:     |
                          |  Run HH algorithm to  |
                          |     generate tree     |
                          |    (Section A.4.4)    |
                          +-----------------------+

                     Figure 2b : HH Tree Construction

   The following chapters describe the mechanisms of EPIC in greater
   detail.

Price et al.                                                  [PAGE 8]


INTERNET-DRAFT                   EPIC                November 17, 2000

5.  EPIC Inputset

   This chapter describes the inputset that EPIC requires to create a
   new set of compressed header formats. Section 5.1 describes the
   general inputset parameters common to the entire protocol stack,
   whilst Section 5.2 lists the inputset parameters specific to each
   field.

5.1.  General inputset parameters

   The general parameters are used to configure the lookup tables, the
   alignment of the compressed headers and the amount of memory reserved
   for EPIC at the compressor and decompressor.

5.1.1.  Lookup tables

   One of the compression techniques offered by EPIC is table-based item
   compression as described in Section 5.8.1 of [ROHC5]. This
   compression method allows an item of data to be transmitted in full
   together with an index. The item is stored in a lookup table at the
   position specified by index, and once the compressor gains enough
   confidence that the decompressor has successfully received the
   mapping it can send the item by transmitting its index only.

   EPIC allows multiple lookup tables to be set up, where each lookup
   table is capable of storing a fixed number of items. Note that
   smaller lookup tables achieve better compression efficiency since
   fewer bits are required to index an item in the table.

   Each lookup table is given a unique name by which it can be
   referenced. Lookup tables are initialized as shown in the example
   below:

   LTABLE{"0000","0101","110111"}

   The name of the lookup table is given, followed by a list of bit
   patterns that represent the initial entries in the lookup table. Note
   that the one lookup table may sometimes store bit patterns with
   several different lengths (for example when the lookup table is used
   to store values from a variable-length field).

   Lookup tables are accessed and updated using a pair of encoding
   techniques as described in Chapter 6.

5.1.2.  Control of header alignment

   The alignment of the compressed headers is controlled using the ALIGN
   and NPATTERNS parameters. Each of the parameters is an integer.

   All of the compressed headers produced by EPIC are guaranteed to be
   an integer multiple of ALIGN bits long. Moreover, the first word of
   each compressed header (where one word is ALIGN bits) uses only
   NPATTERNS of the 2^ALIGN different bit patterns available.


Price et al.                                                  [PAGE 9]


INTERNET-DRAFT                   EPIC                November 17, 2000

   For example, suppose that ALIGN = 8 (the octet-aligned case) and
   NPATTERNS = 224. In the first octet of every compressed header only
   224 of the available 256 bit patterns are used by EPIC, namely the
   bit patterns 00000000, 00000001 through to 11011111 inclusive. The
   bit patterns which are not used are 11100000 through to 11111111.

   In fact, it is important for EPIC not to use the bit patterns
   111XXXXX in the first octet of each compressed header because they
   are reserved by the ROHC framework. The correct parameters for EPIC
   to produce [ROHC5]-compatible compressed header formats are ALIGN = 8
   and NPATTERNS = 224.

   Some useful values for ALIGN and NPATTERNS are listed below:

   Compressed Headers           ALIGN   NPATTERNS

   Bit-aligned                  1       2
   Octet-aligned                8       256
   Octet-aligned with           8       224
     111XXXXX reserved

5.1.3.  Available memory

   The MAXFORMATS and PTHRESH parameters control the number of
   compressed header formats to be stored at the compressor and
   decompressor.

   The MAXFORMATS parameter determines the maximum number of compressed
   header formats stored by EPIC. If more compressed header formats are
   generated than can be stored then EPIC discards all but the
   MAXFORMATS most likely formats to be used. Any packet requiring a
   compressed header format that has been discarded is communicated
   using a DYNAMIC packet instead.

   Similarly, the PTHRESH parameter specifies a probability threshold
   for the compressed header formats. If any format is used with less
   that PTHRESH probability then it is discarded to conserve memory at
   the compressor and decompressor. A DYNAMIC packet can be used to
   communicate the information instead.

   The MAXFORMATS and PTHRESH parameters affect the EPIC compressed
   header formats, and so for interoperability they MUST be specified as
   part of the EPIC inputset.

   Note that one compressed header format requires approximately 2
   octets of storage space at the compressor and decompressor. Since the
   header formats are only calculated once, it is possible to store them
   in read only memory.

   Note also that EPIC requires additional working memory to compress
   and decompress the headers, to store the lookup tables and so on.
   However, EPIC places no extra working memory requirements on the
   compressor and decompressor compared to [ROHC5].


Price et al.                                                 [PAGE 10]


INTERNET-DRAFT                   EPIC                November 17, 2000

5.2.  Field-specific parameters

   EPIC contains a toolbox of eight basic compression techniques (LSB
   compression, lookup table compression etc.). The field-specific
   parameters indicate which encoding methods from the toolbox should be
   applied to which fields in the protocol stack.

   The basic plan is to build up more complex encoding methods from the
   set of eight techniques in the encoding toolbox (Chapter 6). This
   allows us to construct an encoding method sufficiently powerful to
   handle the entire protocol stack.

   A new encoding method is described by means of a table as shown below
   (N.B. the normative version of IPv4 encoding is found in Appendix B):

   IPv4-ENCODING: TTL{"",""}

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |        Version         |           STATIC           |     100%    |
   |     Header Length      |                            |             |
   |     Reserved Flag      |                            |             |
   |   May Fragment Flag    |                            |             |
   |   Last Fragment Flag   |                            |             |
   |    Fragment Offset     |                            |             |
   |     Source Address     |                            |             |
   |  Destination Address   |                            |             |
   +------------------------+----------------------------+-------------+
   |     Packet Length      |          INFERRED          |     100%    |
   |    Header Checksum     |                            |             |
   |        Protocol        |                            |             |
   +------------------------+----------------------------+-------------+
   |    Type of Service     |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(8)        |     0.1%    |
   +------------------------+----------------------------+-------------+
   |     Identification     |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(16)       |     0.1%    |
   +------------------------+----------------------------+-------------+
   |      Time to Live      |           STATIC           |      99%    |
   |                        +----------------------------+-------------+
   |                        |         READ(TTL,2)        |     0.9%    |
   |                        +----------------------------+-------------+
   |                        |       WRITE(8,TTL,2)       |     0.1%    |
   +------------------------+----------------------------+-------------+

   IPv4-ENCODING is the name of the new encoding method. TTL{"",""} is a
   lookup table containing two entries, which is used to store the two
   most recent values of the Time to Live field. This lookup table is
   useful because the TTL field sometimes alternates between two
   different values (this is possible, for example, in the case of load
   sharing over paths of unequal hop counts).

Price et al.                                                 [PAGE 11]


INTERNET-DRAFT                   EPIC                November 17, 2000

   The Field Name(s) column contains a list of field names from the
   protocol stack to be compressed (in this case an IPv4 header).

   Notice that there is no length indication for any of the fields. This
   is because the length of the field can be inferred from the field
   name. EPIC can also cope with variable-length fields (as described in
   Section 6.2.1).

   For each field (or set of fields) the table includes a list of
   possible encoding methods. The encoding methods can be chosen from
   the set of eight techniques in the encoding toolbox of Chapter 6 (in
   the example the STATIC, IRREGULAR, LSB, READ, WRITE and INFERRED
   encoding methods are all from the toolbox). Alternatively, the
   encoding methods can be defined by their own tables as shown below:

   RTP/UDP/IPv4-ENCODING:

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |          CRC           |        IRREGULAR(3)        |     100%    |
   +------------------------+----------------------------+-------------+
   | Master Sequence Number |         LSB(4,-1)          |     100%    |
   +------------------------+----------------------------+-------------+
   |      IPv4 Header       |       IPv4-ENCODING        |     100%    |
   +------------------------+----------------------------+-------------+
   |       UDP Header       |        UDP-ENCODING        |     100%    |
   +------------------------+----------------------------+-------------+
   |       RTP Header       |        RTP-ENCODING        |     100%    |
   +------------------------+----------------------------+-------------+

   The RTP/UDP/IPv4 encoding method uses the IPv4 encoding method
   defined previously. In this manner it is possible to build some
   powerful and flexible encoding methods.

   The final column lists the probability that each encoding method will
   be used to encode the field in question. This is a very important
   column since EPIC ensures that common encoding methods require fewer
   bits to carry the compressed data than rarely used encoding methods.

   Note that the order in which fields are given and the order of the
   encoding types for a particular field is significant in the following
   sense:
   - changing the order does not impact on the compression/decompression
   performance of the profile; however
   - changing the order does modify the format and content of the
   compressed headers themselves.

   Therefore, for interoperability, both the compressor and decompressor
   MUST have a common set of input tables including all ordering
   considerations.




Price et al.                                                 [PAGE 12]


INTERNET-DRAFT                   EPIC                November 17, 2000

6.  EPIC Encoding Toolbox

   ROHC can encode header fields using a number of different encoding
   techniques (LSB encoding, scaled timestamp encoding, list-based
   compression etc.). The EPIC encoding toolbox supports all of the
   techniques currently required by [ROHC5]. Different encoding methods
   can be assigned to different fields using a simple table of inputs as
   described in Section 5.2.

6.1.  Toolbox members

   The EPIC encoding toolbox contains the following techniques:

6.1.1.  STATIC

   The STATIC encoding method can be used when the header field does not
   change relative to the context. If a field is STATIC then no
   information concerning the field need be transmitted in the
   compressed header.

   The STATIC command has no parameters.

6.1.2.  IRREGULAR(k)

   The IRREGULAR(k) encoding method is used when the field cannot be
   compressed relative to the context, and hence must be transmitted in
   full.

   The parameter k is the (uncompressed) field length in bits.

   Note that it is possible to use the IRREGULAR encoding method to
   compress variable-length fields. This is accomplished by specifying a
   separate IRREGULAR(k) encoding method for every common length of the
   variable-length field. Further information can be found in Section
   6.2.1.

6.1.3.  VALUE(v)

   The VALUE(v) encoding method has a single parameter v, which is a
   field value represented as a bit pattern. The VALUE(v) encoding can
   be used to transmit any field which takes value v.

   Note that VALUE(v) is just a simplified form of lookup table
   compression. It is useful for compressing fields which take a well-
   known value with high probability.

6.1.4.  READ(LTABLE,i)

   The READ(LTABLE,i) encoding method retrieves the field value from the
   lookup table LTABLE. Recall that EPIC maintains a number of distinct
   lookup tables for storing and retrieving common field values. If the
   value of a field is stored in the lookup table LTABLE then it can be
   efficiently compressed by sending its position in the lookup table
   only.

Price et al.                                                 [PAGE 13]


INTERNET-DRAFT                   EPIC                November 17, 2000

   The parameter LTABLE is the name of the lookup table. The parameter i
   is the number of items which the lookup table can store.

   Common field values can be stored in the lookup table in two ways.
   Firstly they might be stored in the table at initialization of the
   compressor/decompressor. Secondly they can be updated by using the
   WRITE encoding method.

   If the value of the field is not present in any lookup table then the
   READ encoding method cannot be used to compress the field.

6.1.5.  WRITE(k,LTABLE,i)

   The WRITE encoding method sends the field in full and also stores it
   in the lookup table LTABLE. Subsequently, whenever the field value
   turns up it can be retrieved from the lookup table using the
   READ(LTABLE,i) encoding method (provided of course that the WRITE
   method has been used sufficient times to ensure robustness).

   The WRITE command has the following 3 parameters:

   k:      Uncompressed field length in bits
   LTABLE: Name of lookup table
   i:      Number of values which the lookup table can store

6.1.6.  LSB(k,p)

   The LSB(k,p) method compresses the field using LSB encoding.

   LSB encoding is described in Section 4.5.1 of [ROHC5]. The meaning of
   the two parameters k and p is described in [ROHC5], but basically the
   parameters have the following functions:

   k:   Number of LSBs to transmit in the compressed header
   p:   Offset of interpretation interval

6.1.7.  UNCOMPRESSED

   The UNCOMPRESSED encoding method is used to send the field value in
   full at the end of the compressed header.

   Note that all UNCOMPRESSED fields are sent concatenated together at
   the end of the compressed header. This means that the decompressor
   needs to know the length of each UNCOMPRESSED field in order to
   reconstruct the uncompressed header. The way in which the field
   length is inferred is specific to each UNCOMPRESSED field, and is
   described separately from the table of the EPIC inputset.

6.1.8.  INFERRED

   The INFERRED method instructs EPIC to infer the value of the field
   from other field values and/or the context.



Price et al.                                                 [PAGE 14]


INTERNET-DRAFT                   EPIC                November 17, 2000

   The INFERRED method has no parameters. The precise method by which
   INFERRED fields are reconstructed is specific to each field, and is
   described separately from the table of the EPIC inputset.

6.2.  Hints on applying the toolbox

   This section gives some hints on how the EPIC toolbox can be applied
   to compress different types of field.

6.2.1.  Compressing variable-length fields

   EPIC can compress variable-length fields efficiently by allocating a
   separate encoding method for each possible length of the field. This
   takes into account the probability distribution of the field length,
   so more common field sizes can be compressed with better efficiency.

   This technique can be used to compress the GRE Key field as follows:

   GRE-ENCODING:

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |          ...           |            ...             |     ...     |
   +------------------------+----------------------------+-------------+
   |    GRE Key Present     |          INFERRED          |     100%    |
   +------------------------+----------------------------+-------------+
   |        GRE Key         |           STATIC           |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |     0.5%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(32)       |     0.5%    |
   +------------------------+----------------------------+-------------+
   |          ...           |            ...             |     ...     |
   +------------------------+----------------------------+-------------+

   The GRE Key field has two possible lengths (0 octets or 4 octets). If
   the key is not static then it can be sent in full with either of
   these two lengths. The GRE Key Present field is inferred from the
   size of the GRE Key.

   Note that if the field has many different lengths, for simplicity
   only the most common field lengths need be sent using the
   IRREGULAR(k) encoding. Fields with less common lengths can be sent
   UNCOMPRESSED.

6.2.2.  Compressing extension headers

   Protocols such as IP have a number of optional extensions that
   follow the basic header. In [ROHC5] these extensions are handled by
   defining a new list-based compression type. In EPIC however, the
   extension headers can be handled using the basic toolbox encoding
   methods with no special enhancements.


Price et al.                                                 [PAGE 15]


INTERNET-DRAFT                   EPIC                November 17, 2000

   An example encoding method for the AH header is given below:

   RTP/UDP/IPv4-ENCODING:

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |          ...           |            ...             |     ...     |
   +------------------------+----------------------------+-------------+
   | Authentication Header  |        AH-ENCODING         |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |       1%    |
   +------------------------+----------------------------+-------------+
   |          ...           |            ...             |     ...     |
   +------------------------+----------------------------+-------------+

   N.B. The term "Authentication Header" in the Field Name(s) column is
   just shorthand for the set of fields in the authentication header.

   AH-ENCODING: COMMONLENGTHS{"00","03","04","05","06"}
                AH-SPI{"","","","",""}

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |     AH Next Header     |          INFERRED          |     100%    |
   |      AH Reserved       |                            |             |
   +------------------------+----------------------------+-------------+
   |       AH Length        |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |   READ(COMMONLENGTHS,5)    |    0.09%    |
   |                        +----------------------------+-------------+
   |                        |  WRITE(8,COMMONLENGTHS,5)  |    0.01%    |
   +------------------------+----------------------------+-------------+
   |         AH SPI         |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |       READ(AH-SPI,5)       |   0.099%    |
   |                        +----------------------------+-------------+
   |                        |     WRITE(32,AH-SPI,5)     |   0.001%    |
   +------------------------+----------------------------+-------------+
   |   AH Sequence Number   |          INFERRED          |      99%    |
   |                        +----------------------------+-------------+
   |                        |         LSB(8,-1)          |     0.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(32)       |     0.1%    |
   +------------------------+----------------------------+-------------+
   | AH Authentication Data |        UNCOMPRESSED        |     100%    |
   +------------------------+----------------------------+-------------+

   The AH header is encoded in the same manner as any other header in
   the protocol stack. The only difference is that the entire AH header
   may not be present in the stack, so an additional IRREGULAR(0)
   encoding is required for the whole AH header. This encoding is used


Price et al.                                                 [PAGE 16]


INTERNET-DRAFT                   EPIC                November 17, 2000

   to delete every field in the AH header when it is no longer present
   in the uncompressed packet.

   Note that IP extension headers do not necessarily occur in the same
   order for every packet stream. This is not a problem for EPIC
   because each field in the protocol stack is transmitted separately
   (and not necessarily in order). The individual fields are
   reconstructed at the decompressor and then placed in the
   uncompressed header at their correct locations.

   If the order of any fields changes then EPIC transmits the fields
   uncompressed. When this information has been sent sufficient times
   to ensure robustness, the decompressor knows the new ordering of the
   fields and can subsequently reconstruct the uncompressed headers
   using the new field ordering as appropriate.


7.  Control fields

   In general the fields specified in the EPIC input tables are taken
   directly from the protocol stack to be compressed. However there are
   a small number of special fields that can be transmitted in the
   compressed headers, but which are not present in the uncompressed
   packets. These fields are known as control fields because they
   transmit control information from the compressor to the decompressor.

   The most obvious example of a control field is the header compression
   CRC. This CRC is calculated over the uncompressed header, and is used
   to detect incorrect decompression at the decompressor.

   EPIC treats the control fields in the same manner as any normal
   field. The only difference is that at the compressor the control
   field values are calculated rather than obtained directly from the
   uncompressed packet. Similarly the control field values are used by
   the decompressor itself (e.g. to check the header for errors) and are
   not forwarded in the decompressed packet.

   EPIC supports the following control fields:

7.1.  CRC

   The CRC field holds a CRC checksum calculated across the entire
   uncompressed header, as described in Section 5.9.2 of [ROHC5]. At the
   decompressor the CRC field is used to validate that correct
   decompression has occurred.

   Note that it is possible for different header formats to have
   different amounts of CRC protection, so extra CRC bits can be
   allocated to protect important context-updating information. This is
   accomplished by creating a new encoding method for the fields to be
   given extra CRC protection as shown in the example below:




Price et al.                                                 [PAGE 17]


INTERNET-DRAFT                   EPIC                November 17, 2000

   EXAMPLE-ENCODING:

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |    Important Field     |         NORMAL-CRC         |      50%    |
   |          CRC           +----------------------------+-------------+
   |                        |        IRREGULAR(8)        |      50%    |
   +------------------------+----------------------------+-------------+

   NORMAL-CRC:

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |    Important Field     |           STATIC           |     100%    |
   +------------------------+----------------------------+-------------+
   |          CRC           |        IRREGULAR(3)        |     100%    |
   +------------------------+----------------------------+-------------+

   Observe that the compressed header has 3 bits of CRC protection if
   the important field is STATIC. However, if the field is IRREGULAR(8)
   then the compressed header has 8 bits of CRC protection. This is
   useful since the IRREGULAR behavior is context-updating.

7.2.  Master Sequence Number

   The RTP/UDP/IP protocol stack contains many different sequence
   numbers (the RTP sequence number, the ESP sequence number, the AH
   sequence number and so on). One sequence number must be transmitted
   to protect against bursts of consecutive lost packets, but it is then
   possible to infer all other sequence numbers by linear extrapolation
   (i.e. by adding or subtracting a constant integer).

   Unfortunately, no sequence number field is guaranteed to be present
   in every IP protocol stack. The RTP sequence number is not visible if
   the IP payload is encrypted, whilst the ESP and AH sequence numbers
   are only included if the optional ESP and AH headers are present. To
   overcome this problem EPIC defines a control field called the Master
   Sequence Number field. This field is present in every compressed
   header and can be used to infer all other sequence numbers.

   The size of the Master Sequence Number field is chosen for the best
   trade-off between robustness and efficiency. A Master Sequence Number
   of k bits prevent context failure for up to 2^k - 1 consecutive lost
   packets.

7.3.  Miscellaneous Information

   The Miscellaneous Information fields transmit any extra information
   that EPIC needs to reconstruct the UNCOMPRESSED and INFERRED fields,
   but which is not already present in the compressed header.



Price et al.                                                 [PAGE 18]


INTERNET-DRAFT                   EPIC                November 17, 2000

   Some examples of Miscellaneous Information fields used in RTP/UDP/IP
   compression are given below:

   TS: The Least Significant Bits (LSBs) of the RTP Timestamp, scaled
   if Tsc = 1.

   TS STRIDE: The predicted increment/decrement of the RTP Timestamp
   field when it changes. The use for this field is explained in
   Section 4.5.4 of [ROHC5].

   TIME STRIDE: The predicted time interval (in milliseconds) between
   changes in the RTP Timestamp. The use for this field is explained in
   Section 4.5.4 of [ROHC5].

   Tsc: Indicates whether the timestamp is scaled or not.

   All of these fields are needed to infer the RTP timestamp. Note that
   Miscellaneous Information fields are control fields because they
   discarded once they have been used to infer the relevant
   UNCOMPRESSED or INFERRED fields.


8.  Extensibility

   This chapter considers the extensibility of the EPIC scheme,
   including possible future enhancements to EPIC.

8.1.  Other protocol stacks

   Especially for multimedia over IP, additional protocols will become
   important in future, e.g. tunneling protocols for virtual private
   networks and unreliable STCP for multimedia. They will generate
   further protocol overhead for transmission and new requirements
   towards efficient header compression.

   EPIC requires minimal effort to generate a new ROHC profile. The
   input to EPIC is the behavior of the new protocol stack (i.e. which
   fields require which encoding techniques). EPIC then produces an
   optimally efficient set of compressed header formats for use at the
   compressor and decompressor.

8.2.  New toolbox methods

   The eight encoding techniques currently offered by the EPIC toolbox
   are sufficient to compress the entire RTP/UDP/IP protocol stack
   including the CSRC list and IP extension headers. However, if EPIC
   is used to compress other protocol stacks then it may become
   necessary to add new encoding methods to the toolbox.

   In general, a new toolbox encoding method is just a mapping between
   a set of uncompressed field values and their compressed equivalents.
   The only requirement is that the mapping is 1-1 (in other words that
   no two field values map onto the same compressed value). It is
   acceptable for the mapping to change depending on the context,

Price et al.                                                 [PAGE 19]


INTERNET-DRAFT                   EPIC                November 17, 2000

   although care must be taken in this case to ensure that it is
   robust. For example, a DELTA encoding method that stores the
   increase in the field value relative to the context is very
   efficient for compressing incrementing fields. However it is not
   robust to lost packets (since it fails if the decompressor context
   is incorrect).

8.3.  New control fields

   It may also be necessary to extend EPIC by adding extra control
   fields. Recall that control fields communicate information between
   the compressor and decompressor that is not forwarded in the
   uncompressed header. Since control fields are treated in exactly the
   same manner as normal header fields (except that they are not
   present in the uncompressed header), the only change required is a
   description of how the control field is calculated at the compressor
   and how it is used at the decompressor.

8.4.  Learning version of EPIC

   As stated in the introduction, the EPIC compressed header formats
   are optimal for the chosen protocol stack. This means that it is
   provably impossible to compress any protocol stack for which EPIC is
   programmed in a more efficient manner than EPIC (for the chosen
   header alignment and level of robustness).

   An interesting question, however, is the effectiveness of EPIC when
   compressing a protocol stack for which it is not programmed. If the
   correct encoding methods have been assigned to each field but the
   probabilities that they will be used are slightly inaccurate then
   the efficiency lost is negligible.

   But suppose that the protocol stack behaves in a significantly
   different manner than expected by ROHC: for example if an RTP stream
   uses padding or even RTP extensions. Static compressors with fixed
   compressed header formats suffer a permanent drop in performance if
   the protocol stack behaves in an unusual manner. However, since EPIC
   can dynamically generate new packet formats as they are needed, it
   is possible for an EPIC-generated ROHC profile to adapt its own
   header formats to the incoming packet stream.

   A basic version of "Learning" EPIC is straightforward to implement.
   The encoding methods assigned to each field are not changed; instead
   the probabilities that each encoding method will be used are refined
   by counting the number of times they have been used by recent
   packets. In this manner, if a particular compressed header format is
   used more often than expected then the Huffman tree will be refined
   to encode the format using fewer bits.

   Fast algorithms exist for updating a Huffman tree when only a small
   number of inputs have changed. Care must be taken to ensure that the
   Huffman trees at the compressor and decompressor are updated in
   step; this can be accomplished by updating the tree periodically at


Price et al.                                                 [PAGE 20]


INTERNET-DRAFT                   EPIC                November 17, 2000

   the compressor and sending the refined version to the decompressor
   within a special "profiling" message.


9.  EPIC Processing

   This chapter presents an outline description of the EPIC processing
   without any algorithmic or computational notation. The detailed
   pseudocode description of EPIC is presented in Appendix A.

   There are four main aspects to the pseudocode:

   Structure of the Huffman tree
   Compressor operation
   Decompressor operation
   Tree construction

9.1.  Structure of the Huffman Tree

   Recall that EPIC uses a Hierarchical Huffman (HH) tree to succinctly
   store a set of compressed header formats. A diagram of an HH tree can
   be found in Figure 3.

   The HH tree consists of a set of nodes joined by branches.
   Conventionally the nodes are arranged vertically, with leaf nodes at
   the top and the root node at the bottom.

   There is a single root node, which is the starting point for
   decompression operations. There is a set of leaf nodes, which are
   used as the starting points for compression operations. Each
   compressed header format is associated with a specific leaf node and
   vice versa. All other nodes are interior nodes.

   The information needed for compression and decompression is contained
   in the tree topology (i.e. the connectivity of nodes and branches)
   and the labels on each branch. Information is also associated with
   each node whilst the tree is being constructed, but this can be
   discarded once the tree has been built.

   Note that this "tree" is multiply connected, i.e. there are several
   alternative paths between any given leaf node and the root. The
   correct route to follow is determined by the compression and
   decompression algorithms.

   An example of a bit-aligned HH tree is given in Figure 3. The example
   considers a very simple protocol stack with two fields labeled A and
   B. Four compressed header formats are available to compress this
   protocol stack. Each header format is placed in a leaf node of the
   Huffman tree, together with the probability "P" that the format will
   be used to compress a header and the number "N" of distinct headers
   that can be compressed using the format.

   The procedure for generating the tree itself is described in Appendix
   A.4.

Price et al.                                                 [PAGE 21]


INTERNET-DRAFT                   EPIC                November 17, 2000


        Leaf 1            Leaf 2            Leaf 3             Leaf 4
      A=STATIC,          A=STATIC,      A=IRREGULAR(1)        A=STATIC,
    B=IRREGULAR(2)   B=READ(LTABLE,3)      B=STATIC           B=STATIC

         ---               ---               ---                 ---
        | 3%|             | 8%|             | 5%|               |54%|
        | 4 |             | 3 |             | 2 |               | 1 |
         ---               ---               ---                 ---
          | X             / D \               | X               / 0
          |              /     \              |                /
          |             /       \             |               /
         ---         ---         ---         ---             /
        | 6%|       | 8%|       | 8%|       |10%|           /
        | 2 |       | 2 |       | 1 |       | 1 |          /
         ---         ---         ---         ---          /
          | X         | X         1 \       / 0          /
          |           |              \     /            /
          |           |               \   /            /
         ---         ---               ---            /
        |12%|       |16%|             |18%|          /
        | 1 |       | 1 |             | 1 |         /
         ---         ---               ---         /
          1 \       / 0               / 0         /
             \     /                 /           /
              \   /                 /           /
               ---                 /           /
              |28%|               /           /
              | 1 |              /           /
               ---              /           /
                1 \            /           /
                   \          /           /
                    \        /           /
                     \      /           /
                      \    /           /
                        ---           /
                       |46%|         /
                       | 1 |        /
                        ---        /
                         1 \      /
                            \    /
                             \  /
                             ----
                            |100%|
                            | 1  |
                             ----

           Figure 3 : An example of a Hierarchical Huffman tree


9.2.  Compressor operation

   This section describes the steps of operation within the EPIC header
   compressor. This will be done in 3 steps: the selection of the

Price et al.                                                 [PAGE 22]


INTERNET-DRAFT                   EPIC                November 17, 2000

   relevant HH tree and leaf node, the compression itself, and the
   addition of the uncompressed fields.

9.2.1.  Step 1: Compression front end

   The input to the EPIC compressor is a header from the protocol stack
   to be compressed. The compressor maintains a number of contexts as
   described in Section 5.1.3 of [ROHC5], and the correct context to use
   should be selected depending on the uncompressed header.

   The uncompressed header should then be divided up into fields. As per
   [ROHC5] Section 5.7, the term hdr(X) will refer to the value of Field
   X in the uncompressed header.

   Next the compressor chooses an encoding method from the EPIC toolbox
   for each field in the protocol stack (for example the IP-ID might be
   encoded using the "IRREGULAR" function etc.) from the set of
   possibilities defined by the inputset. The choice of encoding methods
   can then be mapped onto a compressed header format. There is one
   compressed header format for every possible way the fields can be
   encoded (unless the number of formats are limited due to memory
   constraints, in which case a DYNAMIC packet can be sent instead).

   In general the only requirement is that the format must be capable of
   transmitting enough information to reconstruct the header at the
   decompressor. If more than one suitable compressed header format is
   available then the compressor may choose any of these.

9.2.2.  Step 2: Compressing the uncompressed header

   The compression process works down the Huffman tree, making decisions
   at each node based on the choice of encoding method for each field.

   The process finishes when there are no more branches to follow. The
   result is a string of integers (known as codewords) that each contain
   ALIGN bits of the compressed header.

9.2.3.  Step 3: Adding the UNCOMPRESSED fields

   The third step is to add the uncompressed fields to the end of the
   compressed header. The final task is to encapsulate the EPIC
   compressed header within a standard ROHC packet as illustrated in
   Figure 1. The packet with compressed header is then ready to be
   transmitted.

9.3.  Decompressor operation

   This section describes the steps of operation within the EPIC header
   decompressor.

9.3.1.  Step 1: Decompressing the compressed header

   Each codeword is extracted from the received compressed header.
   Starting at the root of the Huffman tree, branches are followed up

Price et al.                                                 [PAGE 23]


INTERNET-DRAFT                   EPIC                November 17, 2000

   the tree and decisions are taken at each node based on the current
   codeword. When the top of the tree is reached, the leaf node
   indicates which compressed header format has been used. This in turn
   gives the encoding method that has been used for every field in the
   header.

   These values are then used to find the format of the original field,
   with which reconstruction of the original field value can occur. This
   process continues until all codewords have been dealt with.

9.3.2.  Step 2: Reconstructing the INFERRED fields

   The final step at the decompressor is to reconstruct the UNCOMPRESSED
   and INFERRED fields from the transmitted fields and/or the context.
   In general the reconstruction process is specific to the individual
   fields.

   The normative pseudocode description of EPIC is given in Appendix A.


10. Security Considerations

   EPIC generates compressed header formats for direct use in ROHC
   profiles. Consequently the security considerations for EPIC match
   those of [ROHC5].


11. Acknowledgements

   Header compression schemes from [ROHC5] have been important sources
   of ideas and knowledge. Basic Huffman encoding [HUFF] was enhanced
   for the specific tasks of robust, efficient header compression.

      Thanks to

         Carsten Bormann (cabo@tzi.org)
         Max Riegel (maximilian.riegel@icn.siemens.de)

      for valuable input and review.


12. Intellectual Property Considerations

   This proposal in is full conformity with [RFC-2026].

   Siemens may have patent rights on technology described in this
   document which employees of Siemens contribute for use in IETF
   standards discussions. In relation to any IETF standard incorporating
   any such technology, Siemens hereby agrees to license on fair,
   reasonable and non-discriminatory terms,  based on reciprocity, any
   patent claims it owns covering such technology, to the extent such
   technology is essential to comply with such standard.



Price et al.                                                 [PAGE 24]


INTERNET-DRAFT                   EPIC                November 17, 2000

13. References

   [ROHC5]       "RObust Header Compression (ROHC)", Carsten Bormann et
                 al, <draft-ietf-rohc-rtp-05.txt>, Internet Engineering
                 Task Force, October 23, 2000

   [HUFF]        "The Data Compression Book", Mark Nelson and Jean-Loup
                 Gailly, M&T Books, 1995

   [IMPL]        http://www.roke.co.uk/networks/epic/epic.html

   [RFC-2026]    "The Internet Standards Process - Revision 3", Scott
                 Bradner, Internet Engineering Task Force, October 1996

   [RFC-2119]    "Key words for use in RFCs to Indicate Requirement
                 Levels", Scott Bradner, Internet Engineering Task
                 Force, March 1997


14.  Authors' Addresses

   Richard Price        Tel: +44 1794 833681
   Email:               richard.price@roke.co.uk

   Robert Hancock       Tel: +44 1794 833601
   Email:               robert.hancock@roke.co.uk

   Stephen McCann       Tel: +44 1794 833341
   Email:               stephen.mccann@roke.co.uk

   Mark A West          Tel: +44 1794 833311
   Email:               mark.a.west@roke.co.uk

   Abigail Surtees      Tel: +44 1794 833131
   Email:               abigail.surtees@roke.co.uk

   Roke Manor Research Ltd
   Romsey, Hants, SO51 0ZN
   United Kingdom


   Christian Schmidt    Tel: +49 89 722 27822
   Email:               christian.schmidt@icn.siemens.de

   Siemens ICM N MR MA1
   Munich
   Germany








Price et al.                                                 [PAGE 25]


INTERNET-DRAFT                   EPIC                November 17, 2000

Appendix A.  EPIC Description in Pseudocode

   Chapter 9 presented an outline description of the operation of the
   EPIC scheme. This appendix goes on to describe the complete EPIC
   scheme in pseudocode.

   As stated in Section 9, there are four main aspects to EPIC:

   Description of the structure of the Hierarchical Huffman (HH) tree
   Compressor operation
   Decompressor operation
   Tree construction

   The final section of the appendix describes certain implementation
   issues.

A.1.  Structure of the Hierarchical Huffman tree

   As mentioned in Section 9, the HH tree consists of a set of nodes
   joined by branches. Conventionally, the nodes are arranged vertically
   with leaf nodes at the top and the root node at the bottom.

   There is a single root node, which is the starting point for
   decompression operations. There is a set of leaf nodes, which are
   used as the starting points for compression operations. Each possible
   compressed header format is associated with a specific leaf node. All
   other nodes are interior nodes.

   The format indicator for a leaf node is in fact just a list of
   encoding methods, one for each field in the header. In what follows,
   we notate the encoding for a single field as "F=E", for a field
   called "F" using encoding method "E". The complete format indicator
   will then look like {F1=E1, F2=E2, ...}. Although we start off in the
   inputset with encoding methods expressed recursively, by the time the
   tree has been constructed all the Ei are actually toolbox encoding
   methods.

   No special information is associated with any of the nodes.
   (Information is associated with each node while the tree is being
   constructed, but this can all be discarded once the construction is
   complete.) All the information is contained in the tree topology (the
   connectivity of nodes and branches) and labels on each branch.

   There are three possible cases for branches between nodes at
   different levels:

   i)   A single vertical branch between a pair of nodes. Such a branch
        is labeled {X}.
   ii)  A pair of branches joining one upper node to two lower ones.
        These branches are labeled {D t} for some integer t (different
        for each branch).
   iii) Several branches each joining one upper node to a common lower
        node. Each branch is labeled with {t} for some integer t
        (different for each branch).

Price et al.                                                 [PAGE 26]


INTERNET-DRAFT                   EPIC                November 17, 2000

   Note that this "tree" can actually be multiply connected, i.e. there
   may be several alternative paths between any given leaf node and the
   root. The correct route to follow is determined by the compression
   and decompression algorithms.

   As well as the tree itself, the following input parameters are needed
   to begin the compression and decompression process:

   ALIGN:       Number of bits for alignment (i.e. size of one
                codeword in the compressed header).
   NPATTERNS:   Number of bit patterns available for use by EPIC in the
                first codeword of the compressed header.

A.2.  Compressor

   This section describes the EPIC header compressor.

A.2.1.  Step 1: Compression front end

   The input to the EPIC compressor is a header from the protocol stack
   to be compressed. The compressor maintains a number of contexts as
   described in Section 5.1.3 of [ROHC5], and the correct context to use
   should be selected depending on the uncompressed header.

   The uncompressed header should then be divided up into fields. As per
   [ROHC5] Section 5.7, the term hdr(X) will refer to the value of Field
   X in the uncompressed header.

   The compressor chooses a compressed header format from the possible
   set (where each compressed header format corresponds to a different
   leaf node in the HH tree). In general the format must be capable of
   transmitting enough information to reconstruct the header at the
   decompressor. If more than one suitable format is available then the
   compressor may choose any of these.

   The compressor then calculates a non-negative integer M that contains
   all of the non-redundant information from the uncompressed header. M
   is calculated using the format indicator for the chosen leaf node.
   The following steps are taken:

   1    Set M=0

   2    From the left, search the format indicator for the next instance
        of "READ" or "WRITE" encoding methods.

        while (A new instance of "READ" or "WRITE" can be found)
        {
             i)      Let F denote the corresponding field and let i
                     denote the corresponding parameter (so the format
                     indicator contains "F=READ(LTABLE,i)" or
                     "F=WRITE(k,LTABLE,i)" for some k,LTABLE).
             ii)     Let M=M*i
             iii)    Let M=M+index where index is an integer from 0 to
                     i-1. It denotes the position of hdr(F) in the

Price et al.                                                 [PAGE 27]


INTERNET-DRAFT                   EPIC                November 17, 2000

                     lookup table and is fixed for READ but chosen by
                     the compressor for WRITE.
        }

   3    From the left, search the format indicator for the next instance
        of "LSB", "IRREGULAR" or "WRITE".

        while (A new instance of "LSB", "IRREGULAR" or "WRITE" encoding
        methods can be found)
        {
             i)      Let F denote the corresponding field and let k
                     denote the corresponding parameter (so the format
                     indicator contains "F=LSB(k,p)" or "F=IRREGULAR(k)
                     or "F=WRITE(k,LTABLE,i)").
             ii)     Let M=M*2^k
             iii)    Let M=M+value, where value is from 0 to 2^k - 1.
                     For IRREGULAR and WRITE encoding it is hdr(F) in
                     integer form. For LSB encoding it is the LSB of
                     hdr(F) as described in Section 4.5.1 of [ROHC5].
        }


   The end result is the M value for the header, along with the
   identified starting leaf node.

A.2.2.  Step 2: Compressing the uncompressed header

   Working from the M value and start node, the encoding process creates
   a sequence {codeword(i)} of codewords as follows:

   1    Begin at the start node in the tree corresponding to the header
   format. Set i=1.

   2    Follow the branches of the tree downwards towards the root,
        taking the following steps at each node:

        a)      If the next branch label is {X} then:

                i)      codeword(i) = M modulo 2^ALIGN, i=i+1
                ii)     set M = integer part (M / 2^ALIGN)

        b)      If there are 2 branches with labels {D N1}, {D N2} then:

                i)      d = absolute difference between N1 and N2
                ii)     if M < d then follow branch to left
                iii)    else set M = M - d and follow right branch

        c)      If the next branch label is a number {L} then:

                i)      codeword(i) = L + M, i=i+1
                ii)     set M = 0




Price et al.                                                 [PAGE 28]


INTERNET-DRAFT                   EPIC                November 17, 2000

   3    Finish when there are no more branches to follow, set
        NCODEWORDS=i (i.e. NCODEWORDS is the total number of codewords
        in the compressed header).

   4    Reverse the list of codewords, so codeword(1) is the last
        codeword produced, and codeword(NCODEWORDS) is the first.

   This produces a string of codeword numbers, where each codeword
   represents ALIGN bits in the EPIC compressed header. For the most
   likely permutations this string could contain a single codeword.

   It is a property of EPIC that the following hold:
   For i=1:                  0 <= codeword(i) < NPATTERNS
   For i=2, ..., NCODEWORDS: 0 <= codeword(i) < 2^ALIGN

   This means that the compressed header can be very simply transmitted
   with implicit codeword alignment. The values to be placed in the
   first codeword are limited to be numerically less than NPATTERNS,
   which means that part of this codeword can be used for other
   purposes.

A.2.3.  Step 3: Adding the UNCOMPRESSED fields

   The last step is to add the UNCOMPRESSED fields to the end of the
   compressed header. This is accomplished as follows:

   1    From the left, search the format indicator for the next instance
        of "UNCOMPRESSED".

        while (A new instance of "UNCOMPRESSED" can be found)
        {
             i)      Let F denote the corresponding field.
             ii)     Add hdr(F) to the compressed header.
        }

   If the compressed header is no longer a multiple of ALIGN bits then
   it should be padded by adding "0" bits onto the end.

   The final task is to encapsulate the EPIC compressed header within a
   standard ROHC packet as described in Section 3.2. The packet with
   compressed header is then ready to be transmitted.

A.3.  Decompressor

   This section describes the EPIC header decompressor.

A.3.1.  Step 1: Decompressing the compressed header

   The list C=(codeword(1), ... codeword(NCODEWORDS)) can be read
   directly from the compressed header. Note that the integer NCODEWORDS
   (i.e. the length of the compressed header excluding UNCOMPRESSED
   fields) is not yet known, but instead will be determined when the
   compressed header is parsed.


Price et al.                                                 [PAGE 29]


INTERNET-DRAFT                   EPIC                November 17, 2000

   The list C and the Huffman tree are used to determine the compressed
   header format and the M value as follows:

   1)   Set M = 0 and i = 1

   2)   Begin at root of tree (single node) and follow the branches up
        the tree taking the following steps at each node:

        a)      If next branch label is {X} then:

                i)      set M = M * 2^ALIGN
                ii)     set M = M + codeword(i)
                iii)    i=i+1

        b.)     If next branch label is {D t} then:

                i)      if t = 0 just continue up the tree
                ii)     else set M = M + t and continue up tree

        c.)     If there is > 1 branch, each with a label {t} then:

                i)     set c = codeword(i), i=i+1
                ii)    find neighboring pair of branches with labels
                       {L} and {U} such that L <= c < U (if c >= number
                       on every branch then take L = number on branch
                       with greatest number)
                iii)    set M = c - L
                iv)     continue up tree on branch labeled {L}

   3)   End up at a leaf node of the tree corresponding to a particular
        compressed header format and value M

   The value M and the header format can then be provided to the
   decompression front end of Section A.3.2. The value NCODEWORDS does
   not need to be transmitted explicitly, since the process terminates
   of its own accord when a leaf node of the tree is reached.

A.3.2.  Step 2: Decompression front end

   The decompressor begins with the non-negative integer M which is then
   translated back into a value hdr(X) for each field X using the format
   indicator. The following steps are taken:

   1    Begin with M

   2    From the right, search the format indicator for the next
        instance of "LSB", "IRREGULAR" or "WRITE" encoding methods.

        while (A new instance of "LSB", "IRREGULAR" or "WRITE" can be
        found)
        {
             i)      Let F denote the corresponding field and let k
                     denote the corresponding parameter (so the format


Price et al.                                                 [PAGE 30]


INTERNET-DRAFT                   EPIC                November 17, 2000

                     indicator contains "F=LSB(k,p)" or "F=IRREGULAR(k)
                     or "F=WRITE(k,LTABLE,i)").
             ii)     Let value=M mod 2^k (so value is an integer from 0
                     to 2^k - 1). For IRREGULAR and WRITE encoding let
                     hdr(F) be the k bits corresponding to value. For
                     LSB encoding let hdr(F) be context(F) with the last
                     k LSBs at offset p replaced by value as described
                     in Section 4.5.1 of [ROHC].
             iii)    M=(M-value)/(2^k).
        }

   3    From the right, search the format indicator for the next
        instance of "READ" or "WRITE" encoding methods.

        while (A new instance of "READ" or "WRITE" can be found)
        {
             i)      Let F denote the corresponding field and let i
                     denote the corresponding parameter (so the format
                     indicator reads "F=READ(LTABLE,i)" or
                     "F=WRITE(k,LTABLE,i)" for some k,LTABLE).
             ii)     Let index=M mod i (so index is an integer from 0 to
                     i - 1). For READ encoding let hdr(F) be the value
                     stored in LTABLE at the position specified by
                     index. For WRITE encoding store the value hdr(F) in
                     LTABLE at the position specified by index.
             iii)    M=(M-index)/i.
        }

   4    Reading backwards from the right, search the format indicator
        for the next field encoded as "STATIC" or "VALUE".

        while (A new instance of "STATIC" or "VALUE" can be found)
        {
             i)      Let F denote the corresponding field (so the format
                     indicator reads "F=STATIC" or "F=VALUE(v)" for some
                     bit pattern v).
             ii)     For STATIC encoding let hdr(F)=context(F). For
                     VALUE encoding let hdr(F)=v.
        }

   The end result is that hdr(F) has been determined for all fields
   other than UNCOMPRESSED and INFERRED fields.

A.3.3.  Step 3: Reconstructing the INFERRED fields

   The final step at the decompressor is to reconstruct the UNCOMPRESSED
   and INFERRED fields from the transmitted fields and/or the context.

   In general the reconstruction process is specific to the individual
   fields (and is implemented using a separate subroutine for each). For
   the UNCOMPRESSED fields it is necessary to know the lengths of each
   field, so that the information can be retrieved from the end of the
   compressed header. For the INFERRED fields the procedure for
   calculating the actual value of the field must be known.

Price et al.                                                 [PAGE 31]


INTERNET-DRAFT                   EPIC                November 17, 2000

A.4.  Hierarchical Huffman tree construction

   This section describes how EPIC converts an inputset into a new set
   of compressed header formats. We first introduce basic concepts and
   notation that are used in the construction of the HH tree.

   An FNP table is a 3-column table which represents information about
   the behavior of a particular data item (e.g. a single field, group of
   fields, or a complete protocol stack). Each row of the table is
   called a format and contains the following:

   A format indicator (F): This ties the row to a particular format of
   the data item;
   A number of codepoints (N): This is the number of codepoints that is
   needed to represent the compressed form of the data item when it has
   this format (N measures the amount of non-redundant information in
   the data item);
   A scaled probability (P): This represents the probability of the data
   item having this format, divided by the N.

   An FNP table has associated with it a total number of codepoints,
   CODEPOINTS(table), which is just the sum of the N values for all the
   rows in the table. The set of data formats given in the table MUST be
   "exhaustive", which means that the sum of N*P over the table MUST be
   unity. This ensures that the table contains every possible format for
   the data item. The order of the rows in the table is significant.

   There are two types of FNP tables: fundamental tables and constructed
   tables.

   Fundamental FNP tables correspond to fields described by a single
   item in the EPIC toolbox. The FNP tables for all of these types are
   listed in the following table. Each FNP table has a single row, so
   N*P=1 directly (and hence the FNP table is exhaustive).

   +---------------------------+---------+-----------+
   |  F                        |    N    |     P     |
   +---------------------------+---------+-----------+
   | "Field1=STATIC"           |    1    |     1     |
   +---------------------------+---------+-----------+
   | "Field1=IRREGULAR(k)"     |   2^k   |   1/2^k   |
   +---------------------------+---------+-----------+
   | "Field1=VALUE(v)"         |    1    |     1     |
   +---------------------------+---------+-----------+
   | "Field1=READ(LTABLE,i)"   |    i    |    1/i    |
   +---------------------------+---------+-----------+
   | "Field1=WRITE(k,LTABLE,i)"|  i*2^k  | 1/(i*2^k) |
   +---------------------------+---------+-----------+
   | "Field1=LSB(k,p)"         |   2^k   |   1/2^k   |
   +---------------------------+---------+-----------+
   | "Field1=UNCOMPRESSED"     |    1    |     1     |
   +---------------------------+---------+-----------+
   | "Field1=INFERRED"         |    1    |     1     |
   +---------------------------+---------+-----------+

Price et al.                                                 [PAGE 32]


INTERNET-DRAFT                   EPIC                November 17, 2000

   Note that the name "Field1" should be replaced by the correct name
   for the field (e.g. "RTP Timestamp", "ESP Sequence Number" etc.).

A.4.1.  Step 1: Process the EPIC inputset

   The input to the packet format generator is a list of general
   parameters and field-specific parameters, referred to as the inputset
   (and described in Chapter 5).

   The parameters used in the packet format generator stage are as
   follows. For the general parameters:

   ALIGN:       Number of bits for alignment (i.e. size of one
                codeword in the compressed header)
   NPATTERNS:   Number of bit patterns available for use by EPIC in the
                first codeword of the compressed header
   MAXFORMATS:  Maximum number of compressed header formats (optional)
   PTHRESH:     Minimum probability threshold (optional)

   In addition, we need to derive the following quantities:

   AFORMATS:    Actual number of header formats used
   PMISSING:    The total probability of all formats discarded during
                tree FNP-table generation. (PMISSING is not actually
                used during the tree construction, but can be useful
                in validation checks during the process.)

   The field-specific parameters are the encoding methods for the
   headers which the profile is intended to compress. The information in
   these encoding methods is used to make successively more complex FNP-
   tables following either of the SUM or PRODUCT styles described in the
   next section. The end result is the top-level FNP-table which
   describes the complete header stack for the profile. This is suitable
   input for the HH algorithm.

A.4.2.  Step 2: Resolve complex encoding methods

   Constructed FNP tables are used to represent more complex data
   behavior. Complexity is introduced either by allowing more than one
   behavior for a single item, or by considering the behavior of
   multiple data items. These two methods correspond to two styles for
   constructing FNP tables, which we refer to as SUM-style or PRODUCT-
   style respectively. In each case the construction process implicitly
   defines the order of the rows in the constructed table, based on the
   information that describes the construction and the order of the rows
   in each constituent table.

   SUM-style construction is typically used to describe a variety of
   possible behaviors of a single data item (e.g. 80% of the time
   static, remaining 20% alternating between a set of values). SUM-
   style construction takes as input a sequence {TABLE(i), P(i)}, where
   each TABLE is a known FNP table for the data item, and each P(i) is
   the unscaled probability that the behavior of the data item being
   described by TABLE(i) occurs (so the sum over the sequence of P(i)

Price et al.                                                 [PAGE 33]


INTERNET-DRAFT                   EPIC                November 17, 2000

   is unity). The new table is constructed using the following
   mechanism:

   Initialize the constructed table to be empty
   for each element i of the sequence {TABLE(i), P(i)}
   {
      for each row in TABLE(i)
      {
         Add a new row to the constructed table with
               the following entries:
             F = Format from row of TABLE(i)
             N = corresponding entry for N from row of TABLE(i)
             P = P(i)*corresponding entry for P from row of TABLE(i)
      }
   }

   The number of codepoints in the constructed table is the sum of the
   numbers of codepoints from all the constituent tables, and similarly
   for the number of rows in the table.

   The second type of construction is PRODUCT-style. This type of
   construction is used to describe a data item composed of more than
   one field. It takes as an input a simple sequence of {TABLE(i)},
   where each TABLE(i) is a known FNP table, and there are NT tables in
   all. The new table is constructed using the following mechanism:

   Initialize the constructed table to be empty
   for each row i1 of TABLE(1)
   {
      for each row i2 of TABLE(2)
      {
         ...
            for each row iNT of TABLE(NT)
            {
               Add a new row to the constructed table with the
                     following entries:
                   F = {Format i1 of TABLE(1), Format i2 of
                        TABLE(2), ... , Format iNT of TABLE(NT)}
                   N = [N(i1) of TABLE(i1)]* ... *[N(iNT) of TABLE(iNT)]
                   P = [P(i1) of TABLE(i1)]* ... *[P(iNT) of TABLE(iNT)]
            }
         ...
      }
   }

   The number of codepoints in the constructed table is the product of
   the numbers of codepoints from all the constituent tables, and
   similarly for the number of rows in the table.

   The two types of construction steps can be repeated indefinitely to
   build up progressively more complex data formats. The basis is always
   the set of toolbox formats. The end result is known as the top-level
   FNP table, which describes the whole protocol stack to be compressed.


Price et al.                                                 [PAGE 34]


INTERNET-DRAFT                   EPIC                November 17, 2000

A.4.3.  Step 3: Tree pruning

   Rows are now pruned from the top-level FNP table and alignment is
   enforced using the following procedure.

   1. Sort the rows of the table in order of decreasing N*P. Where there
      are ties in N*P, preserve the original ordering of the table.

   2. If PTHRESH is given, discard all rows from the table for which N*P
      < PTHRESH, and add the values of N*P to the variable PMISSING.

   3. If MAXFORMATS is given, discard all rows from the table other than
      the first MAXFORMATS-1, and add the values of N*P to PMISSING.
      Then re-order the complete set of formats using the original
      ordering of encodings (ignoring probability values).

   4. Set AFORMATS = (number of rows remaining in the table) + 1, and
      NPOINTS = sum of all N values for all rows in the table.

   5. Create a new terminal row number AFORMAT, with
      P = 0
      N = (NPATTERNS - NPOINTS) mod (2^ALIGN-1)
      The format indicator for this row is not used.

   Note that in practice, it is not necessary to exhaustively generate
   the completed table before pruning starts (i.e. some short cuts are
   possible). However, for interoperability, care must be taken to
   preserve the order that the standard construction method would
   generate.

   We complete this stage with a complete set of starting leaf nodes for
   the tree, with low probability formats discarded. The nodes have
   assigned N and P values and associated header format indicators.

A.4.4.  Step 4: Creating the HH Tree

   Creating a Hierarchical Huffman tree begins with a set of nodes, each
   one having a number of codepoints (N) and a probability per codepoint
   (P). During the creation of the tree new nodes may be created and
   given a number of codepoints and a probability.  These will then be
   incorporated into the tree (i.e. the new nodes are incorporated into
   the active set out of which the tree is growing, and old nodes which
   are already part of the tree may be removed from this active set) The
   tree building finishes when the active set consists of a single node
   which contains all the codepoints and has probability 1.

   The format indicator of each leaf node is not used for tree
   construction (but instead must be stored for use in compression and
   decompression).

   We denote a node as NODE(k) for j=1, 2, 3 ...
   NODE(k) has number of codepoints N(NODE(k))=n[k]
   NODE(k) has probability per codepoint of P(NODE(k))=p[k]


Price et al.                                                 [PAGE 35]


INTERNET-DRAFT                   EPIC                November 17, 2000

   The initial nodes in the active set correspond to the formats from
   the top-level FNP table: taking on their order, number of codepoints
   and probabilities. Each codepoint corresponds to one possible header
   within the given header format. Conventionally we place these leaf
   nodes at the top of the tree and add new nodes working downwards.

   Note that the total probability of NODE(k) is n[k]*p[k], and we start
   with the total probability of all nodes in the active set being
   1 - PMISSING. As new nodes are added to the set and old ones removed
   or replaced, this property is conserved at each stage.

   As the tree is built up, nodes are joined by branches. There are
   three possible cases for branches between nodes at different levels:

   i)      A single vertical branch between a pair of nodes. Such a
           branch is labeled {X}.
   ii)     A pair of branches joining one upper node to two lower ones.
           These branches are labeled {D t} for some integer t
           (different for each branch).
   iii)    Several branches each joining one upper node to a common
           lower node. Each branch is labeled with {t} for some integer
           t (different for each branch).

   The tree generation algorithm is as follows:

   1. Set NNODES = AFORMAT.

   2. Find node j such that n[j] > 0 and p[j] is minimal (if there are
     equalities then use smallest j). If there is no such j, STOP.

   3. if (n[j] is divisible by 2^a)
      {
         /* Case A - Figure 4 */
        i)  replace NODE(j) in the active set with a new node j with
                new p[j] = 2^ALIGN * p[j]
                new n[j] = n[j] / 2^ALIGN
        ii) join the new NODE(j) to original NODE(j)
        iii)label branch with {X}
      }
      else /* n[j] not divisible by 2^a */
      {
        if (n[j] > 2^a)
        {
           /* Case B - Figure 5 */
          i) create two new nodes, a replacement NODE(j) and NODE(r)
              with r=NNODES+1, and set:
                 p[r] = p[j]
                 n[r] = n[j] modulo 2^ALIGN
                 new p[j] = p[j]
                 new n[j] = n[j] - n[r]
          ii) join these nodes to original NODE(j) (left branch going
              to new NODE(r) and right branch going to NODE(j))
          iii)label branch to left with {D 0} and branch to right with
              {D n[r]}

Price et al.                                                 [PAGE 36]


INTERNET-DRAFT                   EPIC                November 17, 2000

          iv) set NNODES=NNODES+1
        }
        else /* so n[j] < 2^a */
        {
          /* Case C - Figure 6 */
          i)   find nodes k[x], x=1, 2, 3...,b with the following
                properties:
                all k[x] different and k[x] != j;
                p[k[x]] are minimal among the available nodes and
                increasing with x;
                if there are ties among p[k[x]], order the k[x] affected
                by value of k[x];
                b is minimal such that n[j]+n[k[1]]+...+n[k[b]]>2^ALIGN.
          ii)  define S[i] = n[j] + sum over (1<=x<=i-1) of n[k[x]]
          iii) create two new nodes, a replacement NODE(k[b]) and
                NODE(r) with r=NNODES+1, and set:
                   p[r] = p[k[b]]
                   n[r] = 2^ALIGN - S[b]
                   new p[k[b]] = p[k[b]]
                   new n[k[b]] = n[k[b]] - n[r]
          iv)  join NODE(k[b]) and NODE(r) to original NODE(k[b]) (left
                branch going to NODE(r) and right branch going to
                NODE(k[b]))
          v)   label branch to left with {D 0} and branch to right with
                {D n[r]}
          vi)  create a replacement for NODE(j) and set
                   new n[j] = 1
                   new p[j] = n[j]*p[j] + n[r]*p[r] +
                              sum(1<= x<= b-1) of n[k[x]]*p[k[x]]
          vii) join this node to the original NODE(j), to NODE(k[1]) to
                NODE(k[b - 1]) and NODE(r)
          viii)label branch to original NODE(j) with {0}
          ix)  for 1<=x<=b-1, label branch to NODE(k[x]) with {S[x]}
          x)   label branch to node r with {S[b]}
          xi)  remove NODE(k[1]) to NODE(k[b-1]) and NODE(r) from the
               active set
          xii) Set NNODES=NNODES+1

        }
      }


   4.  Return to 2

                             Former NODE(j)
                                {p, n}
                                   |
                                   | {X}
                                   |
                                   |
                               New NODE(j)
                   {p[j]=p*2^ALIGN, n[j]=n/2^ALIGN}

                  Figure 4 : Tree Building Case A

Price et al.                                                 [PAGE 37]


INTERNET-DRAFT                   EPIC                November 17, 2000


                             Former NODE(j)
                                {p, n}
                                  /\
                                 /  \
                                /    \
                               /      \
                              /        \
                        {D 0}/          \{D n[r]}
                            /            \
                           /              \
                          /                \
                        NODE(r)          New NODE(j)
      {p[r]=p, n[r]=n mod 2^ALIGN}     {n[j]=n-n[r], p[j]=p}

                  Figure 5 : Tree Building Case B

   Former        NODE(k[1])...NODE(k[b-1])       Former
   NODE(j)              |     |                NODE(K[b])
    \                   |     |                   /\
     \                  |     |                  /  \
      \                 |     |           {D 0} /    \ {D n[r]}
       \                |     |                /      \
        \               |     |               /        \
         \        {S[1]}|     |{S[b-1]}      /          \
          \             |     |          NODE(r)        New
           \            |     |          {p, n}      NODE(k[b])
            \           |     |           /
         {0} \          |     |          /
              \         |     |         /
               \        |     |        /
                \       |     |       /
                 \      |     |      /{S[b]}
                  \     |     |     /
                   \    |     |    /
                    \   |     |   /
                     \  |     |  /
                      New NODE(j)

                  Figure 6 : Tree Building Case C
           (See text for N and P values of created nodes)

   The HH tree is now ready for use in the compressor and decompressor.

A.5.  EPIC implementation issues

   This section considers some issues that arise when implementing
   EPIC.

A.5.1.  Arithmetic

   The implementation of all stages of the EPIC algorithm can be
   performed using simple arithmetic operations on integer and floating


Price et al.                                                 [PAGE 38]


INTERNET-DRAFT                   EPIC                November 17, 2000

   point numbers. However, some aspects of the algorithm require care
   when being implemented on a real microprocessor.

   The compression/decompression front end works in terms of M values
   which are described in the pseudocode as large integers. However, the
   only operations required are very simple bit pattern manipulations:
   - left and right shift by ALIGN bits;
   - bitwise AND with (2^ALIGN-1);
   - addition to, subtraction from, and comparison with "ordinary"
   integers.
   Given that ALIGN will typically be 8, this represents very limited
   complexity and does not require specialized support in the data path.

   The calculation of the Huffman tree is more involved, since the
   numbers of codepoints can be large integers and the probability
   values are conceptually floating point. The additional operations
   required are the multiplication and addition of these numbers with
   themselves and each other, and comparison of probability values.
   Support for large integer multiplication is not particularly
   problematic since it only has to be performed at tree generation time
   (which could even be an off-line activity); however, if rounding
   error leads to inconsistent floating point orderings then
   interoperability may be compromised.

   There are two solutions to this problem. Firstly, for a given (fixed)
   profile it is possible to construct the tree and also propagate upper
   bounds on the rounding error for the value at each node. It is then
   normally possible to verify that rounding cannot have modified the
   node selections made. This is particularly useful if tree pruning
   measures are taken to eliminate header formats with very small
   probabilities. Note that once the tree has been constructed the
   probability (and indeed codepoint numbers) for the tree nodes are no
   longer necessary, so this process can be carried out off-line if
   necessary.

   The second approach is to note that the only operations involving
   floating point values are multiplication and addition. Therefore, if
   the basic input probabilities (the scaled values in the fundamental
   FNP tables of the toolbox, and P values from profiles) are expressed
   explicitly as multiples of 1/2^m (for any m), the calculations can
   be carried out exactly using just the large integer operations which
   are required for the tree construction anyway. Note that scaled
   probabilities such as 1/v for arbitrary v cannot be exactly
   expressed in this way, which means that the summed N*P values for
   that and derived tables will no longer be precisely unity. However,
   this does not prevent the construction, only modifying the
   validation checks that are appropriate at each stage. Noting that
   this only occurs in the case of lookup-table encoding where the
   value of v is small, profiles which adopt this approach should use
   the convention that the probability will be taken as (exactly)
   n/2^32, where n is the result of dividing 2^32 by v in integer
   arithmetic.



Price et al.                                                 [PAGE 39]


INTERNET-DRAFT                   EPIC                November 17, 2000

Appendix B.  Example Profile for RTP/UDP/IPv4

   This appendix describes an example profile for the compression of
   RTP/UDP/IPv4.

   Note that the probabilities listed for each encoding method are
   initial estimates only. These need to be refined with more accurate
   values from genuine RTP/UDP/IPv4 streams.

B.1.  General parameters

   ALIGN      :  8
   NPATTERNS  :  224
   MAXHEADERS :  1000

   For simplicity, the lookup tables are initialized next to the
   encoding methods in which they are used.

B.2.  Field-specific parameters

   The field-specific parameters are given below for UO-mode with a
   sequential IP ID.

   The modifications required to the inputset for a random IP ID are
   described in Section B.3.

   Each encoding method is listed in the form of a table as per Section
   5.2.


   RTP/UDP/IPv4-ENCODING:

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |          CRC           |        IRREGULAR(3)        |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(7)        |       1%    |
   +------------------------+----------------------------+-------------+
   | Master Sequence Number |         LSB(4,-1)          |     100%    |
   +------------------------+----------------------------+-------------+
   |       RTP Header       |        RTP-ENCODING        |     100%    |
   +------------------------+----------------------------+-------------+
   |       UDP Header       |        UDP-ENCODING        |     100%    |
   +------------------------+----------------------------+-------------+
   |   Inner IPv4 Header    |       IPv4-ENCODING        |     100%    |
   +------------------------+----------------------------+-------------+
   |  Optional Extensions   |     EXTENSION-ENCODING     |     100%    |
   +------------------------+----------------------------+-------------+

   Notes:

   The term "RTP Header" is shorthand for all of the fields in the RTP
   header. A similar note applies for the "UDP Header" name etc.

Price et al.                                                 [PAGE 40]


INTERNET-DRAFT                   EPIC                November 17, 2000

   EXTENSION-ENCODING:

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |   Inner Auth Header    |        AH-ENCODING         |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |       1%    |
   +------------------------+----------------------------+-------------+
   |    Inner ESP Header    |        ESP-ENCODING        |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |       1%    |
   +------------------------+----------------------------+-------------+
   |    Inner GRE Header    |        GRE-ENCODING        |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |       1%    |
   +------------------------+----------------------------+-------------+
   |   Outer IPv4 Header    |    OUTER-IPv4-ENCODING     |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |       1%    |
   +------------------------+----------------------------+-------------+
   |   Outer Auth Header    |     OUTER-AH-ENCODING      |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |       1%    |
   +------------------------+----------------------------+-------------+
   |    Outer ESP Header    |     OUTER-ESP-ENCODING     |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |       1%    |
   +------------------------+----------------------------+-------------+
   |    Outer GRE Header    |     OUTER-GRE-ENCODING     |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |       1%    |
   +------------------------+----------------------------+-------------+

   Notes:

   The optional headers in the protocol stack are treated in exactly the
   same manner as mandatory headers, except that an additional encoding
   method is needed to delete the optional header when it is no longer
   needed.

   Further information on this topic can be found in Section 6.2.2.

   The minimal encapsulation header is not considered because it is not
   covered by [ROHC5].










Price et al.                                                 [PAGE 41]


INTERNET-DRAFT                   EPIC                November 17, 2000

   IPv4-ENCODING: TTL{"",""}

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |        Version         |           STATIC           |     100%    |
   |     Header Length      |                            |             |
   |     Reserved Flag      |                            |             |
   |   Last Fragment Flag   |                            |             |
   |    Fragment Offset     |                            |             |
   |     Source Address     |                            |             |
   |  Destination Address   |                            |             |
   +------------------------+----------------------------+-------------+
   |     Packet Length      |          INFERRED          |     100%    |
   |    Header Checksum     |                            |             |
   |        Protocol        |                            |             |
   +------------------------+----------------------------+-------------+
   |   May Fragment Flag    |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(1)        |     0.1%    |
   +------------------------+----------------------------+-------------+
   |    Type of Service     |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(8)        |     0.1%    |
   +------------------------+----------------------------+-------------+
   |         IP-ID          |          INFERRED          |      99%    |
   |                        +----------------------------+-------------+
   |                        |          LSB(5,0)          |     0.7%    |
   |                        +----------------------------+-------------+
   |                        |          LSB(8,0)          |     0.2%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(16)       |     0.1%    |
   +------------------------+----------------------------+-------------+
   |      Time to Live      |           STATIC           |      99%    |
   |                        +----------------------------+-------------+
   |                        |         READ(TTL,2)        |     0.9%    |
   |                        +----------------------------+-------------+
   |                        |       WRITE(8,TTL,2)       |     0.1%    |
   +------------------------+----------------------------+-------------+
   |   Network Byte Order   |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(1)        |     0.1%    |
   +------------------------+----------------------------+-------------+

   Notes:

   The Header Length field is not static when IP options are used. The
   inputset could be extended to handle this possibility if needed.

   The inputset could also be extended to handle packet fragmentation if
   required.




Price et al.                                                 [PAGE 42]


INTERNET-DRAFT                   EPIC                November 17, 2000

   The Network Byte Order (NBO) field is a Miscellaneous Information
   field (see Section 7.3). Its use is described in Section 4.5.5 of
   [ROHC5].

   The Packet Length field is inferred from information provided by the
   link layer.
   The Header Checksum field is inferred by recalculating the IP
   checksum at the decompressor.
   The Protocol field is inferred from the next header in the chain.
   The IP Identification field is inferred from the Master Sequence
   Number.


   UDP-ENCODING:

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |       Source Port      |           STATIC           |     100%    |
   |    Destination Port    |                            |             |
   +------------------------+----------------------------+-------------+
   |       UDP Length       |          INFERRED          |     100%    |
   |      UDP Checksum      |                            |             |
   +------------------------+----------------------------+-------------+
   |   UDP Checksum Value   |        UNCOMPRESSED        |     100%    |
   +------------------------+----------------------------+-------------+

   Notes:

   Even thought the UDP Checksum is always classed as INFERRED, it may
   still be transmitted in full using the "UDP Checksum Value" field.
   The UDP Checksum is then copied from this value.

   The UDP Checksum Value field is a Miscellaneous Information field. If
   context(UDP checksum) is zero at the compressor then the UDP Checksum
   Value field is empty, otherwise it carries hdr(UDP Checksum) in full.

   The length of the UDP Checksum Value field is inferred at the
   decompressor from context(UDP Checksum). If this is zero then the UDP
   Checksum Value field is empty, otherwise it is 16 bits long.

   The UDP Length field is inferred from information provided by the
   link layer.
   The UDP Checksum field is inferred from the UDP Checksum Value field.
   If this is empty then the UDP Checksum is zero, otherwise it is
   copied in full from the UDP Checksum Value field.









Price et al.                                                 [PAGE 43]


INTERNET-DRAFT                   EPIC                November 17, 2000

   RTP-ENCODING:

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |        Version         |           STATIC           |     100%    |
   |        Padding         |                            |             |
   |       Extension        |                            |             |
   |         SSRC           |                            |             |
   +------------------------+----------------------------+-------------+
   |      CSRC Counter      |          INFERRED          |     100%    |
   |    Sequence Number     |                            |             |
   |     RTP Timestamp      |                            |             |
   +------------------------+----------------------------+-------------+
   |      Payload Type      |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(7)        |     0.1%    |
   +------------------------+----------------------------+-------------+
   |       RTP Marker       |         VALUE("0")         |      90%    |
   |           TS           +----------------------------+-------------+
   |                        |         TALKSPURT          |      10%    |
   +------------------------+----------------------------+-------------+
   |          Tsc           |         VALUE("0")         |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |         VALUE("1")         |     0.1%    |
   +------------------------+----------------------------+-------------+
   |       TS Stride        |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(8)        |    0.04%    |
   |                        +----------------------------+-------------+
   |                        |       IRREGULAR(16)        |    0.03%    |
   |                        +----------------------------+-------------+
   |                        |       IRREGULAR(24)        |    0.02%    |
   |                        +----------------------------+-------------+
   |                        |       IRREGULAR(32)        |    0.01%    |
   +------------------------+----------------------------+-------------+
   |      Time Stride       |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(8)        |    0.04%    |
   |                        +----------------------------+-------------+
   |                        |       IRREGULAR(16)        |    0.03%    |
   |                        +----------------------------+-------------+
   |                        |       IRREGULAR(24)        |    0.02%    |
   |                        +----------------------------+-------------+
   |                        |       IRREGULAR(32)        |    0.01%    |
   +------------------------+----------------------------+-------------+
   |      CSRC Item 1       |      CSRC-ENCODING(1)      |     100%    |
   +------------------------+----------------------------+-------------+
   |      CSRC Item 2       |      CSRC-ENCODING(2)      |     100%    |
   +------------------------+----------------------------+-------------+
   |          ...           |            ...             |     ...     |
   +------------------------+----------------------------+-------------+
   |      CSRC Item 16      |      CSRC-ENCODING(16)     |     100%    |
   +------------------------+----------------------------+-------------+

Price et al.                                                 [PAGE 44]


INTERNET-DRAFT                   EPIC                November 17, 2000

   Notes:

   Self-describing variable-length values are not needed by EPIC.

   The TS, Tsc, TS Stride and Time Stride fields are all Miscellaneous
   Information fields. They are used to reconstruct the RTP Timestamp at
   the decompressor. Their use is explained in Section 4.5.3 and Section
   4.5.4 of [ROHC5].

   The CSRC Counter field is inferred by counting the number of non-
   empty CSRC Item fields.
   The Sequence Number field is inferred from the Master Sequence Number
   by linear extrapolation.
   The RTP Timestamp is inferred from the TS, Tsc, TS Stride and Time
   Stride fields as in Section 4.5.3 and Section 4.5.4 of [ROHC5].

   TALKSPURT:

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |       RTP Marker       |         VALUE("0")         |      80%    |
   |                        +----------------------------+-------------+
   |                        |         VALUE("1")         |      20%    |
   +------------------------+----------------------------+-------------+
   |           TS           |        IRREGULAR(5)        |      90%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(8)        |     9.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(32)       |     0.1%    |
   +------------------------+----------------------------+-------------+

   Notes:

   The RTP Marker and TS fields are given their own encoding method
   because they behave in a dependent manner.
   A talkspurt requires some LSBs of timestamp to be communicated so
   that the correct timestamp value can be inferred from the wall clock.
   The RTP marker is "1" at the beginning of each talkspurt, but is "0"
   when the timestamp LSBs are resent for robustness.

   CSRC-ENCODING(i): CSRCVALUEi{""}

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |      CSRC Item i       |           STATIC           | 100-a-b-c%  |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |      a%     |
   |                        +----------------------------+-------------+
   |                        |     READ(CSRCVALUEi,1)     |      b%     |
   |                        +----------------------------+-------------+
   |                        |   WRITE(32,CSRCVALUEi,1)   |      c%     |
   +------------------------+----------------------------+-------------+

Price et al.                                                 [PAGE 45]


INTERNET-DRAFT                   EPIC                November 17, 2000

   Note:        a = b = c = (17 - i) / 100


   AH-ENCODING: COMMONLENGTHS{"00","03","04","05","06"}
                AH-SPI{"","","","",""}

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |     AH Next Header     |          INFERRED          |     100%    |
   |      AH Reserved       |                            |             |
   +------------------------+----------------------------+-------------+
   |       AH Length        |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |   READ(COMMONLENGTHS,5)    |    0.09%    |
   |                        +----------------------------+-------------+
   |                        |  WRITE(8,COMMONLENGTHS,5)  |    0.01%    |
   +------------------------+----------------------------+-------------+
   |         AH SPI         |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |       READ(AH-SPI,5)       |   0.099%    |
   |                        +----------------------------+-------------+
   |                        |     WRITE(32,AH-SPI,5)     |   0.001%    |
   +------------------------+----------------------------+-------------+
   |   AH Sequence Number   |          INFERRED          |      99%    |
   |                        +----------------------------+-------------+
   |                        |         LSB(8,-1)          |     0.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(32)       |     0.1%    |
   +------------------------+----------------------------+-------------+
   | AH Authentication Data |        UNCOMPRESSED        |     100%    |
   +------------------------+----------------------------+-------------+

   Notes:

   The initial values for the COMMONLENGHS lookup table are given in
   hexadecimal. They represent the lengths of the AH Authentication Data
   field for common hashing functions such as MD5, SHA-1 and so on.

   If the AH SPI field is null (i.e. 0 bits long) then the AH header is
   not present and hence all of the UNCOMPRESSED and INFERRED fields are
   null. Otherwise:

   The length of the AH Authentication Data field is known from the AH
   Length field.

   The AH Next Header field is inferred from the next header in the
   extension chain.
   The AH Reserved field is "0000000000000000".
   The AH Sequence Number field is inferred from the Master Sequence
   Number by linear extrapolation.




Price et al.                                                 [PAGE 46]


INTERNET-DRAFT                   EPIC                November 17, 2000

   ESP-ENCODING: PADDINGLENGTHS{0,0,0,0,0}
                 ESP-SPI{0,0,0,0,0}

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |        ESP SPI         |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |      READ(ESP-SPI,5)       |   0.099%    |
   |                        +----------------------------+-------------+
   |                        |    WRITE(32,ESP-SPI,5)     |   0.001%    |
   +------------------------+----------------------------+-------------+
   |  ESP Sequence Number   |          INFERRED          |      99%    |
   |                        +----------------------------+-------------+
   |                        |         LSB(8,-1)          |     0.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(32)       |     0.1%    |
   +------------------------+----------------------------+-------------+
   |      ESP Padding       |          INFERRED          |   99.99%    |
   |                        +----------------------------+-------------+
   |                        |        UNCOMPRESSED        |    0.01%    |
   +------------------------+----------------------------+-------------+
   |   ESP Padding Length   |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |   READ(PADDINGLENGTHS,5)   |    0.09%    |
   |                        +----------------------------+-------------+
   |                        |  WRITE(PADDINGLENGTHS,5)   |    0.01%    |
   +------------------------+----------------------------+-------------+
   |    ESP Next Header     |          INFERRED          |     100%    |
   +------------------------+----------------------------+-------------+
   |     ESP Auth Data      |       IRREGULAR(96)        |     100%    |
   +------------------------+----------------------------+-------------+

   Notes:

   If the ESP SPI field is null then the ESP header is not present and
   hence all of the UNCOMPRESSED and INFERRED fields are null.
   Otherwise:

   The length of the ESP Padding field is known from the ESP Padding
   Length field.

   The ESP Sequence Number field is inferred from the Master Sequence
   Number by linear extrapolation.
   The ESP Padding field has octets that are 1, 2, 3, ..., k where k is
   the ESP Padding Length.
   The ESP Next Header field is inferred from the next header in the
   extension chain.







Price et al.                                                 [PAGE 47]


INTERNET-DRAFT                   EPIC                November 17, 2000

   GRE-ENCODING:

   +------------------------+----------------------------+-------------+
   |      Field Name(s)     |      Encoding Method       | Probability |
   +------------------------+----------------------------+-------------+
   |    GRE Key Present     |          INFERRED          |     100%    |
   |  GRE Seq Num Present   |                            |             |
   |       GRE Flags        |                            |             |
   |      GRE Version       |                            |             |
   |      GRE Protocol      |                            |             |
   |      GRE Checksum      |                            |             |
   +------------------------+----------------------------+-------------+
   |  GRE Checksum Present  |           STATIC           |    99.9%    |
   |  GRE Routing Present   +----------------------------+-------------+
   | GRE Strict Source Rte  |        IRREGULAR(1)        |     0.1%    |
   +------------------------+----------------------------+-------------+
   | GRE Recursion Control  |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(3)        |     0.1%    |
   +------------------------+----------------------------+-------------+
   |   GRE Checksum Value   |        UNCOMPRESSED        |     100%    |
   +------------------------+----------------------------+-------------+
   |       GRE Offset       |           STATIC           |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |     0.5%    |
   |                        +----------------------------+-------------+
   |                        |       IRREGULAR(16)        |     0.5%    |
   +------------------------+----------------------------+-------------+
   |        GRE Key         |           STATIC           |      99%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(0)        |     0.5%    |
   |                        +----------------------------+-------------+
   |                        |       IRREGULAR(32)        |     0.5%    |
   +------------------------+----------------------------+-------------+
   |  GRE Sequence Number   |          INFERRED          |      99%    |
   |                        +----------------------------+-------------+
   |                        |         LSB(8,-1)          |     0.9%    |
   |                        +----------------------------+-------------+
   |                        |        IRREGULAR(32)       |     0.1%    |
   +------------------------+----------------------------+-------------+
   |      GRE Routing       |           STATIC           |    99.9%    |
   |                        +----------------------------+-------------+
   |                        |        UNCOMPRESSED        |     0.1%    |
   +------------------------+----------------------------+-------------+

   Notes:

   The GRE Checksum Value field is a Miscellaneous Information field. If
   context(GRE Checksum) is zero at the compressor then the GRE Checksum
   Value field is empty, otherwise it carries hdr(GRE Checksum) in full.

   If the GRE Recursion Control field is null then the GRE header is not
   present and hence all of the INFERRED fields are null. Otherwise:


Price et al.                                                 [PAGE 48]


INTERNET-DRAFT                   EPIC                November 17, 2000

   The length of the GRE Checksum Value field is inferred at the
   decompressor from context(GRE Checksum). If this is zero then the GRE
   Checksum Value field is empty, otherwise it is 16 bits long.
   The GRE Routing field contains its own built-in length indicator.

   The GRE Flags field is "00000" and the GRE Version field is "000".
   The GRE Sequence Number field is inferred from the Master Sequence
   Number by linear extrapolation.
   The GRE Protocol field is inferred from the next header in the
   extension chain.
   The GRE Key Present and GRE Sequence Number present fields are "0" if
   their respective fields are null, otherwise they are "1".
   If the Checksum Present and Routing Present fields are both "0" then
   the GRE Checksum is null, else it is inferred from the GRE Checksum
   Value field. If this is empty then the GRE Checksum is zero,
   otherwise it is copied in full from the GRE Checksum value field.

   OUTER-IPv4-ENCODING:

   Identical to IPv4-ENCODING but for fields in the outer IP header.
   Note that the NBO field is replaced by the NBO2 field.

   OUTER-AH-ENCODING:

   Identical to AH-ENCODING but for fields in the outer Authentication
   Header.

   OUTER-ESP-ENCODING:

   Identical to ESP-ENCODING but for fields in the outer ESP header.

   OUTER-GRE-ENCODING:

   Identical to GRE-ENCODING but for fields in the outer GRE header.

B.3.  Changes required for a random IP ID

   If the IP ID changes randomly then a new inputset can be used with
   following encoding:

   +------------------------+----------------------------+-------------+
   |         IP-ID          |       IRREGULAR(16)        |     100%    |
   +------------------------+----------------------------+-------------+

   All other general and field-specific parameters are as above. Note
   that the two inputsets define two different sets of compressed header
   formats (one for sequential IP IDs, one for random IP IDs).

   Since the change between a sequential and a random IP ID occurs only
   rarely, there is no need to reserve space in the compressed headers
   to indicate when to change between the two sets of compressed header
   formats. Instead the change should be communicated using a STATIC or
   DYNAMIC packet.


Price et al.                                                 [PAGE 49]


INTERNET-DRAFT                   EPIC                November 17, 2000

Appendix C.  Proof of Optimal Efficiency

   A very important property of the compressed headers built by EPIC is
   that they are optimally efficient. This means that the average
   compressed header size of EPIC is provably the minimum possible for
   the chosen protocol stack. The precise statement of optimal
   efficiency is given below:

   "If every compressed header format specified by EPIC is used with
   the claimed probability, and if the distribution of headers within
   each format is flat, then the EPIC compressed headers are optimally
   efficient for the chosen level of robustness and word alignment".

   Note that the two caveats in the statement just ensure that EPIC is
   programmed correctly for the protocol stack it is compressing. If
   the input to EPIC states that Field 1 will need LSB(4,0) encoding
   for 20% of the time then optimal efficiency is only achieved if this
   really is the case. Similarly each of the headers which make use of
   a compressed header format must turn up equally often, otherwise the
   extra non-randomness could be used to compress the headers even
   further.

   The proof of optimal efficiency is straightforward in the bit-
   aligned case. Ordinary Huffman encoding works on a stream of
   characters and provides optimal compression for each character in
   turn as explained in [HUFF]. Since Hierarchical Huffman gives
   semantically identical results the proof of optimality applies to it
   as well. But EPIC treats an entire header as a single character in
   the Hierarchical Huffman tree, so EPIC attains optimal compression
   of every header in the chosen protocol stack.

   In the general word-aligned case the compressed headers produced by
   EPIC are optimally efficient for the chosen alignment (for example
   in the octet-aligned case the average compressed header size is
   minimal given that all headers must be a whole number of octets).
   The proof can be easily adapted from the bit-aligned case.



















Price et al.                                                 [PAGE 50]