Network Working Group Richard Price, Siemens/Roke Manor
INTERNET-DRAFT Robert Hancock, Siemens/Roke Manor
Expires: May 2001 Stephen McCann, Siemens/Roke Manor
Mark A West, Siemens/Roke Manor
Abigail Surtees, Siemens/Roke Manor
Christian Schmidt, Siemens
November 17, 2000
Efficient Protocol Independent Compression (EPIC)
<draft-price-rohc-epic-00.txt>
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of [RFC-2026].
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This document is a submission to the IETF ROHC WG. Comments should be
directed to the mailing list of ROHC, rohc@cdt.luth.se.
Abstract
The RObust Header Compression [ROHC5] scheme is designed to compress
packet headers over error prone channels. It is structured into two
main parts. Firstly there is an extensible core framework covering
the basic ROHC protocol and packet formats, which is instantiated for
a particular protocol stack by means of a ROHC profile. Secondly
[ROHC5] also includes a number of specific profiles for particular
protocol stacks (RTP/UDP/IP etc.).
The Efficient Protocol Independent Compression (EPIC) scheme is a
general mechanism for providing additional profiles for use within
the ROHC framework. It can produce a set of compressed header formats
for an arbitrary protocol stack and a chosen level of robustness. As
with [ROHC5], the decompressed headers are semantically identical to
the original uncompressed headers. Moreover the headers generated by
EPIC have the special property that, in a well defined sense, they
are optimally efficient.
Price et al. [PAGE 1]
INTERNET-DRAFT EPIC November 17, 2000
Table of contents
1. Introduction.................................................3
2. Terminology..................................................4
3. Existing Solutions...........................................4
3.1. ROHC framework.............................................4
3.2. Location of EPIC in ROHC...................................4
3.3. ROHC profiles...............................................5
4. Header compression using EPIC................................5
4.1. Compression and decompression procedures...................5
4.2. Huffman compression and EPIC inputsets.....................7
5. EPIC Inputset................................................9
5.1. General parameters.........................................9
5.1.1. Lookup tables............................................9
5.1.2. Control of header alignment..............................9
5.1.3. Available memory.........................................10
5.2. Field-specific parameters..................................11
6. EPIC Encoding Toolbox........................................12
6.1. Toolbox members............................................13
6.1.1. STATIC...................................................13
6.1.2. IRREGULAR(k).............................................13
6.1.3. VALUE(v).................................................13
6.1.4. READ(LTABLE,i)...........................................13
6.1.5. WRITE(k,LTABLE,i)........................................14
6.1.6. LSB(k,p).................................................14
6.1.7. UNCOMPRESSED.............................................14
6.1.8. INFERRED.................................................14
6.2. Hints on applying the toolbox..............................15
6.2.1. Compressing variable-length fields.......................15
6.2.2. Compressing extension headers............................15
7. Control Fields...............................................17
7.1. CRC........................................................17
7.2. Master Sequence Number.....................................18
7.3. Miscellaneous Information..................................18
8. Extensibility................................................19
8.1. Other protocol stacks......................................19
8.2. New toolbox methods........................................19
8.3. New control fields.........................................20
8.4. Learning version of EPIC...................................20
9. EPIC Processing..............................................21
9.1. Structure of the Huffman tree..............................21
9.2. Compressor operation.......................................22
9.2.1. Step 1: Compression front end............................23
9.2.2. Step 2: Compressing the uncompressed header..............23
9.2.3. Step 3: Adding the UNCOMPRESSED fields...................23
9.3. Decompressor operation.....................................23
9.3.1. Step 1: Decompressing the compressed header..............23
9.3.2. Step 2: Reconstructing the INFERRED fields...............24
10. Security considerations.....................................24
11. Acknowledgements............................................24
12. Intellectual Property Considerations........................24
13. References..................................................25
14. Authors' Addresses..........................................25
Appendix A. EPIC Description in Pseudocode......................26
Price et al. [PAGE 2]
INTERNET-DRAFT EPIC November 17, 2000
A.1. Structure of the Hierarchical Huffman Tree.................26
A.2. Compressor.................................................27
A.2.1. Step 1: Compression front end............................27
A.2.2. Step 2: Compressing the uncompressed header..............28
A.2.3. Step 3: Adding the UNCOMPRESSED fields...................29
A.3. Decompressor...............................................29
A.3.1. Step 1: Decompressing the compressed header..............29
A.3.2. Step 2: Decompression front end..........................30
A.3.3. Step 3: Reconstructing the INFERRED fields...............31
A.4. Hierarchical Huffman tree construction.....................32
A.4.1. Step 1: Process the EPIC inputset........................33
A.4.2. Step 2: Resolve complex encoding methods.................33
A.4.3. Step 3: Tree pruning.....................................35
A.4.4. Step 4: Creating the HH Tree.............................35
A.5. EPIC implementation issues.................................38
A.5.1. Arithmetic...............................................38
Appendix B. Example Profile for RTP/UDP/IPv4....................40
B.1. General parameters.........................................40
B.2. Field-specific parameters..................................40
B.3. Changes required for a random IP ID........................49
Appendix C: Proof of Optimal Efficiency..........................50
1. Introduction
This document describes a method for generating new compressed header
formats for [ROHC5]. The Efficient Protocol Independent Compression
(EPIC) scheme takes as its input a list of fields in the protocol
stack to be compressed, and for each field a choice of compression
techniques to be made available for the field. Using this input EPIC
derives a set of compressed header formats stored in the form of a
tree. These compressed header formats can then be used to quickly
compress and decompress headers.
An important property of EPIC is that for any chosen protocol it
generates a set of compressed headers for which the average header
size is provably minimal. In particular, for the same level of
robustness the compressed headers produced by EPIC are 5 - 10% more
efficient than those currently used in [ROHC5]. A subset of EPIC has
been implemented in accordance with this draft to demonstrate the
operation of the algorithms, and this is available from [IMPL].
Chapter 3 gives an overview of [ROHC5] and explains how EPIC fits
into the ROHC framework.
Chapter 4 describes the steps involved in the EPIC scheme.
Chapter 5 lists the inputset needed to create a new set of compressed
header formats.
Chapter 6 describes the eight basic compression techniques available
in the EPIC toolbox.
Chapter 7 discusses the EPIC control fields.
Chapter 8 considers the extensibility of EPIC.
Chapter 9 contains a step-by-step worked example of EPIC in action.
Appendix A gives a normative description of EPIC in pseudocode.
Appendix B lists an example EPIC inputset for RTP/UDP/IPv4.
Appendix C states and proves the optimal efficiency claim for EPIC.
Price et al. [PAGE 3]
INTERNET-DRAFT EPIC November 17, 2000
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC-2119].
3. Existing Solutions
This chapter describes the overall ROHC framework and the location of
EPIC within [ROHC5].
3.1. ROHC framework
[ROHC5] defines a general compressed header format that is used for
any compressed packet as well as feedback and IR (Initialization and
Refresh) packets. The general header format includes a CID (Context
Identifier) to indicate the packet stream to which the header
belongs. It also includes a packet type indicator to specify whether
the packet is a feedback, initialization or general compressed
header, whether it is segmented, and whether it contains padding.
The following packet type indicators are reserved in the overall
[ROHC5] framework:
1110: Padding or Add-CID octet
11110: Feedback
111110: IR-DYNAMIC packet
1111110: IR packet
1111111: Segment
The remaining packet types depend on the protocol stack to be
compressed. A ROHC profile is the specification of how to compress a
certain kind of header stream over a certain kind of link. Each ROHC
profile has its own set of compressed packet formats to complement
the general packet types common to all ROHC profiles.
In general the compressed header formats for a profile are octet-
aligned and do not start with the bit pattern 111XXXXX (since this is
reserved by the overall ROHC framework).
Note that as well as defining a new set of compressed header formats,
each ROHC profile also describes how the compressor and decompressor
should be initialized and when to change state or mode.
3.2 Location of EPIC in ROHC
The EPIC scheme generates new compressed header formats for use in
ROHC profiles.
The packet formats generated by EPIC are fully [ROHC5]-compatible.
The location of the EPIC compressed header within the general ROHC
packet is shown below:
Price et al. [PAGE 4]
INTERNET-DRAFT EPIC November 17, 2000
0 7
--- --- --- --- --- --- --- ---
| Add-CID octet | if for CID 1-15 and small CIDs
+---+--- --- --- ---+--- --- ---+
| EPIC compressed header | 1 octet
+---+--- ---+---+---+--- --- ---+
| |
/ 0, 1, or 2 octets of CID / 1 or 2 octets if large CIDs
| |
+---+---+---+---+---+---+---+---+
/ EPIC compressed header / variable
+---+---+---+---+---+---+---+---+
Figure 1 : Location of EPIC within the general ROHC packet
Note that to ensure ROHC compatibility, the first octet of an EPIC
compressed header never uses the bit pattern 111XXXXX.
3.3. ROHC profiles
[ROHC5] currently contains four profiles:
Profile 0 for sending uncompressed IP packets
Profile 1 for RTP/UDP/IP compression
Profile 2 for UDP/IP compression
Profile 3 for ESP/IP compression
The basic function of EPIC is to generate new profiles for [ROHC5].
Each profile includes a set of compressed header formats that can
efficiently compress one protocol stack. Within each profile, the
size of the CRC checksum, length of the sequence number and memory
requirements for the profile can also be customized.
4. Header Compression using EPIC
This chapter outlines the EPIC scheme.
4.1. Compression and decompression procedures
Any ROHC-compatible header compression scheme can be considered to
operate as follows:
1. For an input uncompressed header, extract the non-redundant
information from the header fields using available context data as
appropriate.
2. Format this information into a compressed header (as in Figure 1)
for transfer to the decompressor.
3. Use this information, along with context data at the decompressor,
to reconstruct the original header.
This section describes in overview how each stage is carried out
using the EPIC scheme.
Price et al. [PAGE 5]
INTERNET-DRAFT EPIC November 17, 2000
Figure 2a illustrates the per-packet processing which is done for
each header to be compressed and decompressed.
(1) | ^
| {Uncompressed {Uncompressed |
| header} header} |
v |
+-----------------+ |
| Classify | |
| packet | |
+-----------------+ |
| |
| {Header & |
| format} +-----------------+
v | Apply decoding |
+-----------------+ | methods and |
| Apply encoding | | reconstruct |
| methods | | header |
| (Section A.2.1) | | (Section A.3.2) |
+-----------------+ +-----------------+
| ^
| {Format & {Format & |
| M value} M value} |
v |
(2) +-----------------+ +-----------------+
| Tree | | Tree |
| traversal | | traversal |
| (Section A.2.2) | | (Section A.3.1) |
+-----------------+ +-----------------+
| ^
| {Codewords} {Codewords} |
v {Compressed |
+-----------------+ header} (3) +-----------------+
| Header | | Header |
| formatting & | ---------------> | reception & |
| transmission | | parsing |
+-----------------+ +-----------------+
Figure 2a : EPIC compression/decompression
For (1) EPIC carries out two principal steps. The first is to choose
a "compressed header format" for the uncompressed header. This format
defines how the non-redundant information will be extracted from each
field in the uncompressed header. Note that the set of available
formats depends on the chosen profile, and optionally on other
factors such as the compressor/decompressor mode.
For the second step of (1) the information is extracted from each
field in turn and the results combined together into a single bit
pattern, which can be thought of as a single large integer known as
the "M value". The process used to do this extraction for any given
field is called the "encoding method" for that field. The set of
fundamental encoding methods is called the "EPIC toolbox", and is
essentially the same as those used for the profiles defined in
Price et al. [PAGE 6]
INTERNET-DRAFT EPIC November 17, 2000
[ROHC5]. The capabilities of EPIC are extensible simply by adding to
this toolbox (all other parts of the process are unaffected);
extensibility issues are considered further in Chapter 8.
The next step (2) involves taking the format and M value and
constructing the compressed header itself. The header formats are
stored using a tree, with each leaf node corresponding to exactly one
compressed header format. EPIC follows the tree from leaf to root,
using the M value to fill gaps in the compressed header where field
information is to be stored.
At the receiver, step (3) essentially carries out equivalent
operations in reverse. The decompressor has an identical tree, and
can use the EPIC compressed header to navigate a path through the
tree starting from the root. This results in a known compressed
header format and reconstructed M value. Since the set of encoding
methods used is now known (from the format) the M value can be
resolved into the non-redundant information from every field,
reconstructing each field in the uncompressed header.
4.2. Huffman compression and EPIC inputsets
Huffman compression is a well known technique used in many popular
compression schemes. EPIC uses a variant of Huffman encoding to
produce a new set of compressed header formats for the compressor and
decompressor. EPIC employs an optimization of the basic Huffman
technique which reduces memory and processing requirements within the
compressor/decompressor.
Basic Huffman encoding is designed to compress an arbitrary stream of
characters from a character set such as ASCII. The idea is to create
a new set of compressed characters, where each normal character maps
onto a compressed character and vice versa. Common characters are
given shorter compressed equivalents than rarely used characters,
reducing the average size of the data stream. The compression ratio
can be improved by increasing the size of one character, but at the
expense of higher memory requirements. In fact the memory required in
a Huffman compressor grows exponentially with the character size so
16-bit characters need 256 times as much memory as 8-bit characters.
The fundamental idea of EPIC is to apply Huffman encoding to a stream
of packet headers, where each header is treated as a single
character. The advantage of this choice is that Huffman encoding is
optimally efficient for a single character, so applying Huffman to
the whole header at once means that the average compressed header
size is provably the minimum possible. The disadvantage of basic
Huffman is that the character set is extremely large (for a 40 octet
protocol stack the character set contains up to 2^320 headers).
Clearly the basic Huffman technique cannot be used in this way.
The solution lies in a modified version of Huffman encoding known as
Hierarchical Huffman (HH). The HH algorithm compresses a data stream
with exactly the same level of efficiency as normal Huffman, but for
a fraction of the processing and memory cost.
Price et al. [PAGE 7]
INTERNET-DRAFT EPIC November 17, 2000
The input required to build a new set of compressed header formats is
known as an "EPIC inputset". Each inputset is essentially a
structured description of the behavior of the protocol stack to be
compressed (along with some control information). EPIC converts an
inputset into a suitable form for the HH algorithm, which then builds
a unique HH tree containing the header formats. Inputsets have a
recursive structure, which makes it easy to build new inputsets out
of existing ones (e.g. when adding new layers to a protocol stack).
In advanced applications of EPIC new inputsets can even be
transmitted in a simple "profiling" message, which allows dynamic
updating of the header compression capabilities of a device.
Figure 2b describes the process of building the HH tree from the EPIC
inputset, which is done once only and the results stored at the
compressor and decompressor. References are given to pseudocode in
Appendix A which describes the various stages explicitly.
+-----------------------+
| Input Stage 1: |
| Enter the inputset |
| which describes |
| the protocol stack |
| (Section A.4.1) |
+-----------------------+
|
v
+-----------------------+
| Input Stage 2: |
| Resolve complex |
| encoding methods into |
| fundamental methods |
| (Section A.4.2) |
+-----------------------+
|
v
+-----------------------+
| Input Stage 3: |
| Prune low probability |
| header formats |
| (Section A.4.3) |
+-----------------------+
|
v
+-----------------------+
| Input Stage 4: |
| Run HH algorithm to |
| generate tree |
| (Section A.4.4) |
+-----------------------+
Figure 2b : HH Tree Construction
The following chapters describe the mechanisms of EPIC in greater
detail.
Price et al. [PAGE 8]
INTERNET-DRAFT EPIC November 17, 2000
5. EPIC Inputset
This chapter describes the inputset that EPIC requires to create a
new set of compressed header formats. Section 5.1 describes the
general inputset parameters common to the entire protocol stack,
whilst Section 5.2 lists the inputset parameters specific to each
field.
5.1. General inputset parameters
The general parameters are used to configure the lookup tables, the
alignment of the compressed headers and the amount of memory reserved
for EPIC at the compressor and decompressor.
5.1.1. Lookup tables
One of the compression techniques offered by EPIC is table-based item
compression as described in Section 5.8.1 of [ROHC5]. This
compression method allows an item of data to be transmitted in full
together with an index. The item is stored in a lookup table at the
position specified by index, and once the compressor gains enough
confidence that the decompressor has successfully received the
mapping it can send the item by transmitting its index only.
EPIC allows multiple lookup tables to be set up, where each lookup
table is capable of storing a fixed number of items. Note that
smaller lookup tables achieve better compression efficiency since
fewer bits are required to index an item in the table.
Each lookup table is given a unique name by which it can be
referenced. Lookup tables are initialized as shown in the example
below:
LTABLE{"0000","0101","110111"}
The name of the lookup table is given, followed by a list of bit
patterns that represent the initial entries in the lookup table. Note
that the one lookup table may sometimes store bit patterns with
several different lengths (for example when the lookup table is used
to store values from a variable-length field).
Lookup tables are accessed and updated using a pair of encoding
techniques as described in Chapter 6.
5.1.2. Control of header alignment
The alignment of the compressed headers is controlled using the ALIGN
and NPATTERNS parameters. Each of the parameters is an integer.
All of the compressed headers produced by EPIC are guaranteed to be
an integer multiple of ALIGN bits long. Moreover, the first word of
each compressed header (where one word is ALIGN bits) uses only
NPATTERNS of the 2^ALIGN different bit patterns available.
Price et al. [PAGE 9]
INTERNET-DRAFT EPIC November 17, 2000
For example, suppose that ALIGN = 8 (the octet-aligned case) and
NPATTERNS = 224. In the first octet of every compressed header only
224 of the available 256 bit patterns are used by EPIC, namely the
bit patterns 00000000, 00000001 through to 11011111 inclusive. The
bit patterns which are not used are 11100000 through to 11111111.
In fact, it is important for EPIC not to use the bit patterns
111XXXXX in the first octet of each compressed header because they
are reserved by the ROHC framework. The correct parameters for EPIC
to produce [ROHC5]-compatible compressed header formats are ALIGN = 8
and NPATTERNS = 224.
Some useful values for ALIGN and NPATTERNS are listed below:
Compressed Headers ALIGN NPATTERNS
Bit-aligned 1 2
Octet-aligned 8 256
Octet-aligned with 8 224
111XXXXX reserved
5.1.3. Available memory
The MAXFORMATS and PTHRESH parameters control the number of
compressed header formats to be stored at the compressor and
decompressor.
The MAXFORMATS parameter determines the maximum number of compressed
header formats stored by EPIC. If more compressed header formats are
generated than can be stored then EPIC discards all but the
MAXFORMATS most likely formats to be used. Any packet requiring a
compressed header format that has been discarded is communicated
using a DYNAMIC packet instead.
Similarly, the PTHRESH parameter specifies a probability threshold
for the compressed header formats. If any format is used with less
that PTHRESH probability then it is discarded to conserve memory at
the compressor and decompressor. A DYNAMIC packet can be used to
communicate the information instead.
The MAXFORMATS and PTHRESH parameters affect the EPIC compressed
header formats, and so for interoperability they MUST be specified as
part of the EPIC inputset.
Note that one compressed header format requires approximately 2
octets of storage space at the compressor and decompressor. Since the
header formats are only calculated once, it is possible to store them
in read only memory.
Note also that EPIC requires additional working memory to compress
and decompress the headers, to store the lookup tables and so on.
However, EPIC places no extra working memory requirements on the
compressor and decompressor compared to [ROHC5].
Price et al. [PAGE 10]
INTERNET-DRAFT EPIC November 17, 2000
5.2. Field-specific parameters
EPIC contains a toolbox of eight basic compression techniques (LSB
compression, lookup table compression etc.). The field-specific
parameters indicate which encoding methods from the toolbox should be
applied to which fields in the protocol stack.
The basic plan is to build up more complex encoding methods from the
set of eight techniques in the encoding toolbox (Chapter 6). This
allows us to construct an encoding method sufficiently powerful to
handle the entire protocol stack.
A new encoding method is described by means of a table as shown below
(N.B. the normative version of IPv4 encoding is found in Appendix B):
IPv4-ENCODING: TTL{"",""}
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| Version | STATIC | 100% |
| Header Length | | |
| Reserved Flag | | |
| May Fragment Flag | | |
| Last Fragment Flag | | |
| Fragment Offset | | |
| Source Address | | |
| Destination Address | | |
+------------------------+----------------------------+-------------+
| Packet Length | INFERRED | 100% |
| Header Checksum | | |
| Protocol | | |
+------------------------+----------------------------+-------------+
| Type of Service | STATIC | 99.9% |
| +----------------------------+-------------+
| | IRREGULAR(8) | 0.1% |
+------------------------+----------------------------+-------------+
| Identification | STATIC | 99.9% |
| +----------------------------+-------------+
| | IRREGULAR(16) | 0.1% |
+------------------------+----------------------------+-------------+
| Time to Live | STATIC | 99% |
| +----------------------------+-------------+
| | READ(TTL,2) | 0.9% |
| +----------------------------+-------------+
| | WRITE(8,TTL,2) | 0.1% |
+------------------------+----------------------------+-------------+
IPv4-ENCODING is the name of the new encoding method. TTL{"",""} is a
lookup table containing two entries, which is used to store the two
most recent values of the Time to Live field. This lookup table is
useful because the TTL field sometimes alternates between two
different values (this is possible, for example, in the case of load
sharing over paths of unequal hop counts).
Price et al. [PAGE 11]
INTERNET-DRAFT EPIC November 17, 2000
The Field Name(s) column contains a list of field names from the
protocol stack to be compressed (in this case an IPv4 header).
Notice that there is no length indication for any of the fields. This
is because the length of the field can be inferred from the field
name. EPIC can also cope with variable-length fields (as described in
Section 6.2.1).
For each field (or set of fields) the table includes a list of
possible encoding methods. The encoding methods can be chosen from
the set of eight techniques in the encoding toolbox of Chapter 6 (in
the example the STATIC, IRREGULAR, LSB, READ, WRITE and INFERRED
encoding methods are all from the toolbox). Alternatively, the
encoding methods can be defined by their own tables as shown below:
RTP/UDP/IPv4-ENCODING:
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| CRC | IRREGULAR(3) | 100% |
+------------------------+----------------------------+-------------+
| Master Sequence Number | LSB(4,-1) | 100% |
+------------------------+----------------------------+-------------+
| IPv4 Header | IPv4-ENCODING | 100% |
+------------------------+----------------------------+-------------+
| UDP Header | UDP-ENCODING | 100% |
+------------------------+----------------------------+-------------+
| RTP Header | RTP-ENCODING | 100% |
+------------------------+----------------------------+-------------+
The RTP/UDP/IPv4 encoding method uses the IPv4 encoding method
defined previously. In this manner it is possible to build some
powerful and flexible encoding methods.
The final column lists the probability that each encoding method will
be used to encode the field in question. This is a very important
column since EPIC ensures that common encoding methods require fewer
bits to carry the compressed data than rarely used encoding methods.
Note that the order in which fields are given and the order of the
encoding types for a particular field is significant in the following
sense:
- changing the order does not impact on the compression/decompression
performance of the profile; however
- changing the order does modify the format and content of the
compressed headers themselves.
Therefore, for interoperability, both the compressor and decompressor
MUST have a common set of input tables including all ordering
considerations.
Price et al. [PAGE 12]
INTERNET-DRAFT EPIC November 17, 2000
6. EPIC Encoding Toolbox
ROHC can encode header fields using a number of different encoding
techniques (LSB encoding, scaled timestamp encoding, list-based
compression etc.). The EPIC encoding toolbox supports all of the
techniques currently required by [ROHC5]. Different encoding methods
can be assigned to different fields using a simple table of inputs as
described in Section 5.2.
6.1. Toolbox members
The EPIC encoding toolbox contains the following techniques:
6.1.1. STATIC
The STATIC encoding method can be used when the header field does not
change relative to the context. If a field is STATIC then no
information concerning the field need be transmitted in the
compressed header.
The STATIC command has no parameters.
6.1.2. IRREGULAR(k)
The IRREGULAR(k) encoding method is used when the field cannot be
compressed relative to the context, and hence must be transmitted in
full.
The parameter k is the (uncompressed) field length in bits.
Note that it is possible to use the IRREGULAR encoding method to
compress variable-length fields. This is accomplished by specifying a
separate IRREGULAR(k) encoding method for every common length of the
variable-length field. Further information can be found in Section
6.2.1.
6.1.3. VALUE(v)
The VALUE(v) encoding method has a single parameter v, which is a
field value represented as a bit pattern. The VALUE(v) encoding can
be used to transmit any field which takes value v.
Note that VALUE(v) is just a simplified form of lookup table
compression. It is useful for compressing fields which take a well-
known value with high probability.
6.1.4. READ(LTABLE,i)
The READ(LTABLE,i) encoding method retrieves the field value from the
lookup table LTABLE. Recall that EPIC maintains a number of distinct
lookup tables for storing and retrieving common field values. If the
value of a field is stored in the lookup table LTABLE then it can be
efficiently compressed by sending its position in the lookup table
only.
Price et al. [PAGE 13]
INTERNET-DRAFT EPIC November 17, 2000
The parameter LTABLE is the name of the lookup table. The parameter i
is the number of items which the lookup table can store.
Common field values can be stored in the lookup table in two ways.
Firstly they might be stored in the table at initialization of the
compressor/decompressor. Secondly they can be updated by using the
WRITE encoding method.
If the value of the field is not present in any lookup table then the
READ encoding method cannot be used to compress the field.
6.1.5. WRITE(k,LTABLE,i)
The WRITE encoding method sends the field in full and also stores it
in the lookup table LTABLE. Subsequently, whenever the field value
turns up it can be retrieved from the lookup table using the
READ(LTABLE,i) encoding method (provided of course that the WRITE
method has been used sufficient times to ensure robustness).
The WRITE command has the following 3 parameters:
k: Uncompressed field length in bits
LTABLE: Name of lookup table
i: Number of values which the lookup table can store
6.1.6. LSB(k,p)
The LSB(k,p) method compresses the field using LSB encoding.
LSB encoding is described in Section 4.5.1 of [ROHC5]. The meaning of
the two parameters k and p is described in [ROHC5], but basically the
parameters have the following functions:
k: Number of LSBs to transmit in the compressed header
p: Offset of interpretation interval
6.1.7. UNCOMPRESSED
The UNCOMPRESSED encoding method is used to send the field value in
full at the end of the compressed header.
Note that all UNCOMPRESSED fields are sent concatenated together at
the end of the compressed header. This means that the decompressor
needs to know the length of each UNCOMPRESSED field in order to
reconstruct the uncompressed header. The way in which the field
length is inferred is specific to each UNCOMPRESSED field, and is
described separately from the table of the EPIC inputset.
6.1.8. INFERRED
The INFERRED method instructs EPIC to infer the value of the field
from other field values and/or the context.
Price et al. [PAGE 14]
INTERNET-DRAFT EPIC November 17, 2000
The INFERRED method has no parameters. The precise method by which
INFERRED fields are reconstructed is specific to each field, and is
described separately from the table of the EPIC inputset.
6.2. Hints on applying the toolbox
This section gives some hints on how the EPIC toolbox can be applied
to compress different types of field.
6.2.1. Compressing variable-length fields
EPIC can compress variable-length fields efficiently by allocating a
separate encoding method for each possible length of the field. This
takes into account the probability distribution of the field length,
so more common field sizes can be compressed with better efficiency.
This technique can be used to compress the GRE Key field as follows:
GRE-ENCODING:
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| ... | ... | ... |
+------------------------+----------------------------+-------------+
| GRE Key Present | INFERRED | 100% |
+------------------------+----------------------------+-------------+
| GRE Key | STATIC | 99% |
| +----------------------------+-------------+
| | IRREGULAR(0) | 0.5% |
| +----------------------------+-------------+
| | IRREGULAR(32) | 0.5% |
+------------------------+----------------------------+-------------+
| ... | ... | ... |
+------------------------+----------------------------+-------------+
The GRE Key field has two possible lengths (0 octets or 4 octets). If
the key is not static then it can be sent in full with either of
these two lengths. The GRE Key Present field is inferred from the
size of the GRE Key.
Note that if the field has many different lengths, for simplicity
only the most common field lengths need be sent using the
IRREGULAR(k) encoding. Fields with less common lengths can be sent
UNCOMPRESSED.
6.2.2. Compressing extension headers
Protocols such as IP have a number of optional extensions that
follow the basic header. In [ROHC5] these extensions are handled by
defining a new list-based compression type. In EPIC however, the
extension headers can be handled using the basic toolbox encoding
methods with no special enhancements.
Price et al. [PAGE 15]
INTERNET-DRAFT EPIC November 17, 2000
An example encoding method for the AH header is given below:
RTP/UDP/IPv4-ENCODING:
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| ... | ... | ... |
+------------------------+----------------------------+-------------+
| Authentication Header | AH-ENCODING | 99% |
| +----------------------------+-------------+
| | IRREGULAR(0) | 1% |
+------------------------+----------------------------+-------------+
| ... | ... | ... |
+------------------------+----------------------------+-------------+
N.B. The term "Authentication Header" in the Field Name(s) column is
just shorthand for the set of fields in the authentication header.
AH-ENCODING: COMMONLENGTHS{"00","03","04","05","06"}
AH-SPI{"","","","",""}
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| AH Next Header | INFERRED | 100% |
| AH Reserved | | |
+------------------------+----------------------------+-------------+
| AH Length | STATIC | 99.9% |
| +----------------------------+-------------+
| | READ(COMMONLENGTHS,5) | 0.09% |
| +----------------------------+-------------+
| | WRITE(8,COMMONLENGTHS,5) | 0.01% |
+------------------------+----------------------------+-------------+
| AH SPI | STATIC | 99.9% |
| +----------------------------+-------------+
| | READ(AH-SPI,5) | 0.099% |
| +----------------------------+-------------+
| | WRITE(32,AH-SPI,5) | 0.001% |
+------------------------+----------------------------+-------------+
| AH Sequence Number | INFERRED | 99% |
| +----------------------------+-------------+
| | LSB(8,-1) | 0.9% |
| +----------------------------+-------------+
| | IRREGULAR(32) | 0.1% |
+------------------------+----------------------------+-------------+
| AH Authentication Data | UNCOMPRESSED | 100% |
+------------------------+----------------------------+-------------+
The AH header is encoded in the same manner as any other header in
the protocol stack. The only difference is that the entire AH header
may not be present in the stack, so an additional IRREGULAR(0)
encoding is required for the whole AH header. This encoding is used
Price et al. [PAGE 16]
INTERNET-DRAFT EPIC November 17, 2000
to delete every field in the AH header when it is no longer present
in the uncompressed packet.
Note that IP extension headers do not necessarily occur in the same
order for every packet stream. This is not a problem for EPIC
because each field in the protocol stack is transmitted separately
(and not necessarily in order). The individual fields are
reconstructed at the decompressor and then placed in the
uncompressed header at their correct locations.
If the order of any fields changes then EPIC transmits the fields
uncompressed. When this information has been sent sufficient times
to ensure robustness, the decompressor knows the new ordering of the
fields and can subsequently reconstruct the uncompressed headers
using the new field ordering as appropriate.
7. Control fields
In general the fields specified in the EPIC input tables are taken
directly from the protocol stack to be compressed. However there are
a small number of special fields that can be transmitted in the
compressed headers, but which are not present in the uncompressed
packets. These fields are known as control fields because they
transmit control information from the compressor to the decompressor.
The most obvious example of a control field is the header compression
CRC. This CRC is calculated over the uncompressed header, and is used
to detect incorrect decompression at the decompressor.
EPIC treats the control fields in the same manner as any normal
field. The only difference is that at the compressor the control
field values are calculated rather than obtained directly from the
uncompressed packet. Similarly the control field values are used by
the decompressor itself (e.g. to check the header for errors) and are
not forwarded in the decompressed packet.
EPIC supports the following control fields:
7.1. CRC
The CRC field holds a CRC checksum calculated across the entire
uncompressed header, as described in Section 5.9.2 of [ROHC5]. At the
decompressor the CRC field is used to validate that correct
decompression has occurred.
Note that it is possible for different header formats to have
different amounts of CRC protection, so extra CRC bits can be
allocated to protect important context-updating information. This is
accomplished by creating a new encoding method for the fields to be
given extra CRC protection as shown in the example below:
Price et al. [PAGE 17]
INTERNET-DRAFT EPIC November 17, 2000
EXAMPLE-ENCODING:
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| Important Field | NORMAL-CRC | 50% |
| CRC +----------------------------+-------------+
| | IRREGULAR(8) | 50% |
+------------------------+----------------------------+-------------+
NORMAL-CRC:
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| Important Field | STATIC | 100% |
+------------------------+----------------------------+-------------+
| CRC | IRREGULAR(3) | 100% |
+------------------------+----------------------------+-------------+
Observe that the compressed header has 3 bits of CRC protection if
the important field is STATIC. However, if the field is IRREGULAR(8)
then the compressed header has 8 bits of CRC protection. This is
useful since the IRREGULAR behavior is context-updating.
7.2. Master Sequence Number
The RTP/UDP/IP protocol stack contains many different sequence
numbers (the RTP sequence number, the ESP sequence number, the AH
sequence number and so on). One sequence number must be transmitted
to protect against bursts of consecutive lost packets, but it is then
possible to infer all other sequence numbers by linear extrapolation
(i.e. by adding or subtracting a constant integer).
Unfortunately, no sequence number field is guaranteed to be present
in every IP protocol stack. The RTP sequence number is not visible if
the IP payload is encrypted, whilst the ESP and AH sequence numbers
are only included if the optional ESP and AH headers are present. To
overcome this problem EPIC defines a control field called the Master
Sequence Number field. This field is present in every compressed
header and can be used to infer all other sequence numbers.
The size of the Master Sequence Number field is chosen for the best
trade-off between robustness and efficiency. A Master Sequence Number
of k bits prevent context failure for up to 2^k - 1 consecutive lost
packets.
7.3. Miscellaneous Information
The Miscellaneous Information fields transmit any extra information
that EPIC needs to reconstruct the UNCOMPRESSED and INFERRED fields,
but which is not already present in the compressed header.
Price et al. [PAGE 18]
INTERNET-DRAFT EPIC November 17, 2000
Some examples of Miscellaneous Information fields used in RTP/UDP/IP
compression are given below:
TS: The Least Significant Bits (LSBs) of the RTP Timestamp, scaled
if Tsc = 1.
TS STRIDE: The predicted increment/decrement of the RTP Timestamp
field when it changes. The use for this field is explained in
Section 4.5.4 of [ROHC5].
TIME STRIDE: The predicted time interval (in milliseconds) between
changes in the RTP Timestamp. The use for this field is explained in
Section 4.5.4 of [ROHC5].
Tsc: Indicates whether the timestamp is scaled or not.
All of these fields are needed to infer the RTP timestamp. Note that
Miscellaneous Information fields are control fields because they
discarded once they have been used to infer the relevant
UNCOMPRESSED or INFERRED fields.
8. Extensibility
This chapter considers the extensibility of the EPIC scheme,
including possible future enhancements to EPIC.
8.1. Other protocol stacks
Especially for multimedia over IP, additional protocols will become
important in future, e.g. tunneling protocols for virtual private
networks and unreliable STCP for multimedia. They will generate
further protocol overhead for transmission and new requirements
towards efficient header compression.
EPIC requires minimal effort to generate a new ROHC profile. The
input to EPIC is the behavior of the new protocol stack (i.e. which
fields require which encoding techniques). EPIC then produces an
optimally efficient set of compressed header formats for use at the
compressor and decompressor.
8.2. New toolbox methods
The eight encoding techniques currently offered by the EPIC toolbox
are sufficient to compress the entire RTP/UDP/IP protocol stack
including the CSRC list and IP extension headers. However, if EPIC
is used to compress other protocol stacks then it may become
necessary to add new encoding methods to the toolbox.
In general, a new toolbox encoding method is just a mapping between
a set of uncompressed field values and their compressed equivalents.
The only requirement is that the mapping is 1-1 (in other words that
no two field values map onto the same compressed value). It is
acceptable for the mapping to change depending on the context,
Price et al. [PAGE 19]
INTERNET-DRAFT EPIC November 17, 2000
although care must be taken in this case to ensure that it is
robust. For example, a DELTA encoding method that stores the
increase in the field value relative to the context is very
efficient for compressing incrementing fields. However it is not
robust to lost packets (since it fails if the decompressor context
is incorrect).
8.3. New control fields
It may also be necessary to extend EPIC by adding extra control
fields. Recall that control fields communicate information between
the compressor and decompressor that is not forwarded in the
uncompressed header. Since control fields are treated in exactly the
same manner as normal header fields (except that they are not
present in the uncompressed header), the only change required is a
description of how the control field is calculated at the compressor
and how it is used at the decompressor.
8.4. Learning version of EPIC
As stated in the introduction, the EPIC compressed header formats
are optimal for the chosen protocol stack. This means that it is
provably impossible to compress any protocol stack for which EPIC is
programmed in a more efficient manner than EPIC (for the chosen
header alignment and level of robustness).
An interesting question, however, is the effectiveness of EPIC when
compressing a protocol stack for which it is not programmed. If the
correct encoding methods have been assigned to each field but the
probabilities that they will be used are slightly inaccurate then
the efficiency lost is negligible.
But suppose that the protocol stack behaves in a significantly
different manner than expected by ROHC: for example if an RTP stream
uses padding or even RTP extensions. Static compressors with fixed
compressed header formats suffer a permanent drop in performance if
the protocol stack behaves in an unusual manner. However, since EPIC
can dynamically generate new packet formats as they are needed, it
is possible for an EPIC-generated ROHC profile to adapt its own
header formats to the incoming packet stream.
A basic version of "Learning" EPIC is straightforward to implement.
The encoding methods assigned to each field are not changed; instead
the probabilities that each encoding method will be used are refined
by counting the number of times they have been used by recent
packets. In this manner, if a particular compressed header format is
used more often than expected then the Huffman tree will be refined
to encode the format using fewer bits.
Fast algorithms exist for updating a Huffman tree when only a small
number of inputs have changed. Care must be taken to ensure that the
Huffman trees at the compressor and decompressor are updated in
step; this can be accomplished by updating the tree periodically at
Price et al. [PAGE 20]
INTERNET-DRAFT EPIC November 17, 2000
the compressor and sending the refined version to the decompressor
within a special "profiling" message.
9. EPIC Processing
This chapter presents an outline description of the EPIC processing
without any algorithmic or computational notation. The detailed
pseudocode description of EPIC is presented in Appendix A.
There are four main aspects to the pseudocode:
Structure of the Huffman tree
Compressor operation
Decompressor operation
Tree construction
9.1. Structure of the Huffman Tree
Recall that EPIC uses a Hierarchical Huffman (HH) tree to succinctly
store a set of compressed header formats. A diagram of an HH tree can
be found in Figure 3.
The HH tree consists of a set of nodes joined by branches.
Conventionally the nodes are arranged vertically, with leaf nodes at
the top and the root node at the bottom.
There is a single root node, which is the starting point for
decompression operations. There is a set of leaf nodes, which are
used as the starting points for compression operations. Each
compressed header format is associated with a specific leaf node and
vice versa. All other nodes are interior nodes.
The information needed for compression and decompression is contained
in the tree topology (i.e. the connectivity of nodes and branches)
and the labels on each branch. Information is also associated with
each node whilst the tree is being constructed, but this can be
discarded once the tree has been built.
Note that this "tree" is multiply connected, i.e. there are several
alternative paths between any given leaf node and the root. The
correct route to follow is determined by the compression and
decompression algorithms.
An example of a bit-aligned HH tree is given in Figure 3. The example
considers a very simple protocol stack with two fields labeled A and
B. Four compressed header formats are available to compress this
protocol stack. Each header format is placed in a leaf node of the
Huffman tree, together with the probability "P" that the format will
be used to compress a header and the number "N" of distinct headers
that can be compressed using the format.
The procedure for generating the tree itself is described in Appendix
A.4.
Price et al. [PAGE 21]
INTERNET-DRAFT EPIC November 17, 2000
Leaf 1 Leaf 2 Leaf 3 Leaf 4
A=STATIC, A=STATIC, A=IRREGULAR(1) A=STATIC,
B=IRREGULAR(2) B=READ(LTABLE,3) B=STATIC B=STATIC
--- --- --- ---
| 3%| | 8%| | 5%| |54%|
| 4 | | 3 | | 2 | | 1 |
--- --- --- ---
| X / D \ | X / 0
| / \ | /
| / \ | /
--- --- --- --- /
| 6%| | 8%| | 8%| |10%| /
| 2 | | 2 | | 1 | | 1 | /
--- --- --- --- /
| X | X 1 \ / 0 /
| | \ / /
| | \ / /
--- --- --- /
|12%| |16%| |18%| /
| 1 | | 1 | | 1 | /
--- --- --- /
1 \ / 0 / 0 /
\ / / /
\ / / /
--- / /
|28%| / /
| 1 | / /
--- / /
1 \ / /
\ / /
\ / /
\ / /
\ / /
--- /
|46%| /
| 1 | /
--- /
1 \ /
\ /
\ /
----
|100%|
| 1 |
----
Figure 3 : An example of a Hierarchical Huffman tree
9.2. Compressor operation
This section describes the steps of operation within the EPIC header
compressor. This will be done in 3 steps: the selection of the
Price et al. [PAGE 22]
INTERNET-DRAFT EPIC November 17, 2000
relevant HH tree and leaf node, the compression itself, and the
addition of the uncompressed fields.
9.2.1. Step 1: Compression front end
The input to the EPIC compressor is a header from the protocol stack
to be compressed. The compressor maintains a number of contexts as
described in Section 5.1.3 of [ROHC5], and the correct context to use
should be selected depending on the uncompressed header.
The uncompressed header should then be divided up into fields. As per
[ROHC5] Section 5.7, the term hdr(X) will refer to the value of Field
X in the uncompressed header.
Next the compressor chooses an encoding method from the EPIC toolbox
for each field in the protocol stack (for example the IP-ID might be
encoded using the "IRREGULAR" function etc.) from the set of
possibilities defined by the inputset. The choice of encoding methods
can then be mapped onto a compressed header format. There is one
compressed header format for every possible way the fields can be
encoded (unless the number of formats are limited due to memory
constraints, in which case a DYNAMIC packet can be sent instead).
In general the only requirement is that the format must be capable of
transmitting enough information to reconstruct the header at the
decompressor. If more than one suitable compressed header format is
available then the compressor may choose any of these.
9.2.2. Step 2: Compressing the uncompressed header
The compression process works down the Huffman tree, making decisions
at each node based on the choice of encoding method for each field.
The process finishes when there are no more branches to follow. The
result is a string of integers (known as codewords) that each contain
ALIGN bits of the compressed header.
9.2.3. Step 3: Adding the UNCOMPRESSED fields
The third step is to add the uncompressed fields to the end of the
compressed header. The final task is to encapsulate the EPIC
compressed header within a standard ROHC packet as illustrated in
Figure 1. The packet with compressed header is then ready to be
transmitted.
9.3. Decompressor operation
This section describes the steps of operation within the EPIC header
decompressor.
9.3.1. Step 1: Decompressing the compressed header
Each codeword is extracted from the received compressed header.
Starting at the root of the Huffman tree, branches are followed up
Price et al. [PAGE 23]
INTERNET-DRAFT EPIC November 17, 2000
the tree and decisions are taken at each node based on the current
codeword. When the top of the tree is reached, the leaf node
indicates which compressed header format has been used. This in turn
gives the encoding method that has been used for every field in the
header.
These values are then used to find the format of the original field,
with which reconstruction of the original field value can occur. This
process continues until all codewords have been dealt with.
9.3.2. Step 2: Reconstructing the INFERRED fields
The final step at the decompressor is to reconstruct the UNCOMPRESSED
and INFERRED fields from the transmitted fields and/or the context.
In general the reconstruction process is specific to the individual
fields.
The normative pseudocode description of EPIC is given in Appendix A.
10. Security Considerations
EPIC generates compressed header formats for direct use in ROHC
profiles. Consequently the security considerations for EPIC match
those of [ROHC5].
11. Acknowledgements
Header compression schemes from [ROHC5] have been important sources
of ideas and knowledge. Basic Huffman encoding [HUFF] was enhanced
for the specific tasks of robust, efficient header compression.
Thanks to
Carsten Bormann (cabo@tzi.org)
Max Riegel (maximilian.riegel@icn.siemens.de)
for valuable input and review.
12. Intellectual Property Considerations
This proposal in is full conformity with [RFC-2026].
Siemens may have patent rights on technology described in this
document which employees of Siemens contribute for use in IETF
standards discussions. In relation to any IETF standard incorporating
any such technology, Siemens hereby agrees to license on fair,
reasonable and non-discriminatory terms, based on reciprocity, any
patent claims it owns covering such technology, to the extent such
technology is essential to comply with such standard.
Price et al. [PAGE 24]
INTERNET-DRAFT EPIC November 17, 2000
13. References
[ROHC5] "RObust Header Compression (ROHC)", Carsten Bormann et
al, <draft-ietf-rohc-rtp-05.txt>, Internet Engineering
Task Force, October 23, 2000
[HUFF] "The Data Compression Book", Mark Nelson and Jean-Loup
Gailly, M&T Books, 1995
[IMPL] http://www.roke.co.uk/networks/epic/epic.html
[RFC-2026] "The Internet Standards Process - Revision 3", Scott
Bradner, Internet Engineering Task Force, October 1996
[RFC-2119] "Key words for use in RFCs to Indicate Requirement
Levels", Scott Bradner, Internet Engineering Task
Force, March 1997
14. Authors' Addresses
Richard Price Tel: +44 1794 833681
Email: richard.price@roke.co.uk
Robert Hancock Tel: +44 1794 833601
Email: robert.hancock@roke.co.uk
Stephen McCann Tel: +44 1794 833341
Email: stephen.mccann@roke.co.uk
Mark A West Tel: +44 1794 833311
Email: mark.a.west@roke.co.uk
Abigail Surtees Tel: +44 1794 833131
Email: abigail.surtees@roke.co.uk
Roke Manor Research Ltd
Romsey, Hants, SO51 0ZN
United Kingdom
Christian Schmidt Tel: +49 89 722 27822
Email: christian.schmidt@icn.siemens.de
Siemens ICM N MR MA1
Munich
Germany
Price et al. [PAGE 25]
INTERNET-DRAFT EPIC November 17, 2000
Appendix A. EPIC Description in Pseudocode
Chapter 9 presented an outline description of the operation of the
EPIC scheme. This appendix goes on to describe the complete EPIC
scheme in pseudocode.
As stated in Section 9, there are four main aspects to EPIC:
Description of the structure of the Hierarchical Huffman (HH) tree
Compressor operation
Decompressor operation
Tree construction
The final section of the appendix describes certain implementation
issues.
A.1. Structure of the Hierarchical Huffman tree
As mentioned in Section 9, the HH tree consists of a set of nodes
joined by branches. Conventionally, the nodes are arranged vertically
with leaf nodes at the top and the root node at the bottom.
There is a single root node, which is the starting point for
decompression operations. There is a set of leaf nodes, which are
used as the starting points for compression operations. Each possible
compressed header format is associated with a specific leaf node. All
other nodes are interior nodes.
The format indicator for a leaf node is in fact just a list of
encoding methods, one for each field in the header. In what follows,
we notate the encoding for a single field as "F=E", for a field
called "F" using encoding method "E". The complete format indicator
will then look like {F1=E1, F2=E2, ...}. Although we start off in the
inputset with encoding methods expressed recursively, by the time the
tree has been constructed all the Ei are actually toolbox encoding
methods.
No special information is associated with any of the nodes.
(Information is associated with each node while the tree is being
constructed, but this can all be discarded once the construction is
complete.) All the information is contained in the tree topology (the
connectivity of nodes and branches) and labels on each branch.
There are three possible cases for branches between nodes at
different levels:
i) A single vertical branch between a pair of nodes. Such a branch
is labeled {X}.
ii) A pair of branches joining one upper node to two lower ones.
These branches are labeled {D t} for some integer t (different
for each branch).
iii) Several branches each joining one upper node to a common lower
node. Each branch is labeled with {t} for some integer t
(different for each branch).
Price et al. [PAGE 26]
INTERNET-DRAFT EPIC November 17, 2000
Note that this "tree" can actually be multiply connected, i.e. there
may be several alternative paths between any given leaf node and the
root. The correct route to follow is determined by the compression
and decompression algorithms.
As well as the tree itself, the following input parameters are needed
to begin the compression and decompression process:
ALIGN: Number of bits for alignment (i.e. size of one
codeword in the compressed header).
NPATTERNS: Number of bit patterns available for use by EPIC in the
first codeword of the compressed header.
A.2. Compressor
This section describes the EPIC header compressor.
A.2.1. Step 1: Compression front end
The input to the EPIC compressor is a header from the protocol stack
to be compressed. The compressor maintains a number of contexts as
described in Section 5.1.3 of [ROHC5], and the correct context to use
should be selected depending on the uncompressed header.
The uncompressed header should then be divided up into fields. As per
[ROHC5] Section 5.7, the term hdr(X) will refer to the value of Field
X in the uncompressed header.
The compressor chooses a compressed header format from the possible
set (where each compressed header format corresponds to a different
leaf node in the HH tree). In general the format must be capable of
transmitting enough information to reconstruct the header at the
decompressor. If more than one suitable format is available then the
compressor may choose any of these.
The compressor then calculates a non-negative integer M that contains
all of the non-redundant information from the uncompressed header. M
is calculated using the format indicator for the chosen leaf node.
The following steps are taken:
1 Set M=0
2 From the left, search the format indicator for the next instance
of "READ" or "WRITE" encoding methods.
while (A new instance of "READ" or "WRITE" can be found)
{
i) Let F denote the corresponding field and let i
denote the corresponding parameter (so the format
indicator contains "F=READ(LTABLE,i)" or
"F=WRITE(k,LTABLE,i)" for some k,LTABLE).
ii) Let M=M*i
iii) Let M=M+index where index is an integer from 0 to
i-1. It denotes the position of hdr(F) in the
Price et al. [PAGE 27]
INTERNET-DRAFT EPIC November 17, 2000
lookup table and is fixed for READ but chosen by
the compressor for WRITE.
}
3 From the left, search the format indicator for the next instance
of "LSB", "IRREGULAR" or "WRITE".
while (A new instance of "LSB", "IRREGULAR" or "WRITE" encoding
methods can be found)
{
i) Let F denote the corresponding field and let k
denote the corresponding parameter (so the format
indicator contains "F=LSB(k,p)" or "F=IRREGULAR(k)
or "F=WRITE(k,LTABLE,i)").
ii) Let M=M*2^k
iii) Let M=M+value, where value is from 0 to 2^k - 1.
For IRREGULAR and WRITE encoding it is hdr(F) in
integer form. For LSB encoding it is the LSB of
hdr(F) as described in Section 4.5.1 of [ROHC5].
}
The end result is the M value for the header, along with the
identified starting leaf node.
A.2.2. Step 2: Compressing the uncompressed header
Working from the M value and start node, the encoding process creates
a sequence {codeword(i)} of codewords as follows:
1 Begin at the start node in the tree corresponding to the header
format. Set i=1.
2 Follow the branches of the tree downwards towards the root,
taking the following steps at each node:
a) If the next branch label is {X} then:
i) codeword(i) = M modulo 2^ALIGN, i=i+1
ii) set M = integer part (M / 2^ALIGN)
b) If there are 2 branches with labels {D N1}, {D N2} then:
i) d = absolute difference between N1 and N2
ii) if M < d then follow branch to left
iii) else set M = M - d and follow right branch
c) If the next branch label is a number {L} then:
i) codeword(i) = L + M, i=i+1
ii) set M = 0
Price et al. [PAGE 28]
INTERNET-DRAFT EPIC November 17, 2000
3 Finish when there are no more branches to follow, set
NCODEWORDS=i (i.e. NCODEWORDS is the total number of codewords
in the compressed header).
4 Reverse the list of codewords, so codeword(1) is the last
codeword produced, and codeword(NCODEWORDS) is the first.
This produces a string of codeword numbers, where each codeword
represents ALIGN bits in the EPIC compressed header. For the most
likely permutations this string could contain a single codeword.
It is a property of EPIC that the following hold:
For i=1: 0 <= codeword(i) < NPATTERNS
For i=2, ..., NCODEWORDS: 0 <= codeword(i) < 2^ALIGN
This means that the compressed header can be very simply transmitted
with implicit codeword alignment. The values to be placed in the
first codeword are limited to be numerically less than NPATTERNS,
which means that part of this codeword can be used for other
purposes.
A.2.3. Step 3: Adding the UNCOMPRESSED fields
The last step is to add the UNCOMPRESSED fields to the end of the
compressed header. This is accomplished as follows:
1 From the left, search the format indicator for the next instance
of "UNCOMPRESSED".
while (A new instance of "UNCOMPRESSED" can be found)
{
i) Let F denote the corresponding field.
ii) Add hdr(F) to the compressed header.
}
If the compressed header is no longer a multiple of ALIGN bits then
it should be padded by adding "0" bits onto the end.
The final task is to encapsulate the EPIC compressed header within a
standard ROHC packet as described in Section 3.2. The packet with
compressed header is then ready to be transmitted.
A.3. Decompressor
This section describes the EPIC header decompressor.
A.3.1. Step 1: Decompressing the compressed header
The list C=(codeword(1), ... codeword(NCODEWORDS)) can be read
directly from the compressed header. Note that the integer NCODEWORDS
(i.e. the length of the compressed header excluding UNCOMPRESSED
fields) is not yet known, but instead will be determined when the
compressed header is parsed.
Price et al. [PAGE 29]
INTERNET-DRAFT EPIC November 17, 2000
The list C and the Huffman tree are used to determine the compressed
header format and the M value as follows:
1) Set M = 0 and i = 1
2) Begin at root of tree (single node) and follow the branches up
the tree taking the following steps at each node:
a) If next branch label is {X} then:
i) set M = M * 2^ALIGN
ii) set M = M + codeword(i)
iii) i=i+1
b.) If next branch label is {D t} then:
i) if t = 0 just continue up the tree
ii) else set M = M + t and continue up tree
c.) If there is > 1 branch, each with a label {t} then:
i) set c = codeword(i), i=i+1
ii) find neighboring pair of branches with labels
{L} and {U} such that L <= c < U (if c >= number
on every branch then take L = number on branch
with greatest number)
iii) set M = c - L
iv) continue up tree on branch labeled {L}
3) End up at a leaf node of the tree corresponding to a particular
compressed header format and value M
The value M and the header format can then be provided to the
decompression front end of Section A.3.2. The value NCODEWORDS does
not need to be transmitted explicitly, since the process terminates
of its own accord when a leaf node of the tree is reached.
A.3.2. Step 2: Decompression front end
The decompressor begins with the non-negative integer M which is then
translated back into a value hdr(X) for each field X using the format
indicator. The following steps are taken:
1 Begin with M
2 From the right, search the format indicator for the next
instance of "LSB", "IRREGULAR" or "WRITE" encoding methods.
while (A new instance of "LSB", "IRREGULAR" or "WRITE" can be
found)
{
i) Let F denote the corresponding field and let k
denote the corresponding parameter (so the format
Price et al. [PAGE 30]
INTERNET-DRAFT EPIC November 17, 2000
indicator contains "F=LSB(k,p)" or "F=IRREGULAR(k)
or "F=WRITE(k,LTABLE,i)").
ii) Let value=M mod 2^k (so value is an integer from 0
to 2^k - 1). For IRREGULAR and WRITE encoding let
hdr(F) be the k bits corresponding to value. For
LSB encoding let hdr(F) be context(F) with the last
k LSBs at offset p replaced by value as described
in Section 4.5.1 of [ROHC].
iii) M=(M-value)/(2^k).
}
3 From the right, search the format indicator for the next
instance of "READ" or "WRITE" encoding methods.
while (A new instance of "READ" or "WRITE" can be found)
{
i) Let F denote the corresponding field and let i
denote the corresponding parameter (so the format
indicator reads "F=READ(LTABLE,i)" or
"F=WRITE(k,LTABLE,i)" for some k,LTABLE).
ii) Let index=M mod i (so index is an integer from 0 to
i - 1). For READ encoding let hdr(F) be the value
stored in LTABLE at the position specified by
index. For WRITE encoding store the value hdr(F) in
LTABLE at the position specified by index.
iii) M=(M-index)/i.
}
4 Reading backwards from the right, search the format indicator
for the next field encoded as "STATIC" or "VALUE".
while (A new instance of "STATIC" or "VALUE" can be found)
{
i) Let F denote the corresponding field (so the format
indicator reads "F=STATIC" or "F=VALUE(v)" for some
bit pattern v).
ii) For STATIC encoding let hdr(F)=context(F). For
VALUE encoding let hdr(F)=v.
}
The end result is that hdr(F) has been determined for all fields
other than UNCOMPRESSED and INFERRED fields.
A.3.3. Step 3: Reconstructing the INFERRED fields
The final step at the decompressor is to reconstruct the UNCOMPRESSED
and INFERRED fields from the transmitted fields and/or the context.
In general the reconstruction process is specific to the individual
fields (and is implemented using a separate subroutine for each). For
the UNCOMPRESSED fields it is necessary to know the lengths of each
field, so that the information can be retrieved from the end of the
compressed header. For the INFERRED fields the procedure for
calculating the actual value of the field must be known.
Price et al. [PAGE 31]
INTERNET-DRAFT EPIC November 17, 2000
A.4. Hierarchical Huffman tree construction
This section describes how EPIC converts an inputset into a new set
of compressed header formats. We first introduce basic concepts and
notation that are used in the construction of the HH tree.
An FNP table is a 3-column table which represents information about
the behavior of a particular data item (e.g. a single field, group of
fields, or a complete protocol stack). Each row of the table is
called a format and contains the following:
A format indicator (F): This ties the row to a particular format of
the data item;
A number of codepoints (N): This is the number of codepoints that is
needed to represent the compressed form of the data item when it has
this format (N measures the amount of non-redundant information in
the data item);
A scaled probability (P): This represents the probability of the data
item having this format, divided by the N.
An FNP table has associated with it a total number of codepoints,
CODEPOINTS(table), which is just the sum of the N values for all the
rows in the table. The set of data formats given in the table MUST be
"exhaustive", which means that the sum of N*P over the table MUST be
unity. This ensures that the table contains every possible format for
the data item. The order of the rows in the table is significant.
There are two types of FNP tables: fundamental tables and constructed
tables.
Fundamental FNP tables correspond to fields described by a single
item in the EPIC toolbox. The FNP tables for all of these types are
listed in the following table. Each FNP table has a single row, so
N*P=1 directly (and hence the FNP table is exhaustive).
+---------------------------+---------+-----------+
| F | N | P |
+---------------------------+---------+-----------+
| "Field1=STATIC" | 1 | 1 |
+---------------------------+---------+-----------+
| "Field1=IRREGULAR(k)" | 2^k | 1/2^k |
+---------------------------+---------+-----------+
| "Field1=VALUE(v)" | 1 | 1 |
+---------------------------+---------+-----------+
| "Field1=READ(LTABLE,i)" | i | 1/i |
+---------------------------+---------+-----------+
| "Field1=WRITE(k,LTABLE,i)"| i*2^k | 1/(i*2^k) |
+---------------------------+---------+-----------+
| "Field1=LSB(k,p)" | 2^k | 1/2^k |
+---------------------------+---------+-----------+
| "Field1=UNCOMPRESSED" | 1 | 1 |
+---------------------------+---------+-----------+
| "Field1=INFERRED" | 1 | 1 |
+---------------------------+---------+-----------+
Price et al. [PAGE 32]
INTERNET-DRAFT EPIC November 17, 2000
Note that the name "Field1" should be replaced by the correct name
for the field (e.g. "RTP Timestamp", "ESP Sequence Number" etc.).
A.4.1. Step 1: Process the EPIC inputset
The input to the packet format generator is a list of general
parameters and field-specific parameters, referred to as the inputset
(and described in Chapter 5).
The parameters used in the packet format generator stage are as
follows. For the general parameters:
ALIGN: Number of bits for alignment (i.e. size of one
codeword in the compressed header)
NPATTERNS: Number of bit patterns available for use by EPIC in the
first codeword of the compressed header
MAXFORMATS: Maximum number of compressed header formats (optional)
PTHRESH: Minimum probability threshold (optional)
In addition, we need to derive the following quantities:
AFORMATS: Actual number of header formats used
PMISSING: The total probability of all formats discarded during
tree FNP-table generation. (PMISSING is not actually
used during the tree construction, but can be useful
in validation checks during the process.)
The field-specific parameters are the encoding methods for the
headers which the profile is intended to compress. The information in
these encoding methods is used to make successively more complex FNP-
tables following either of the SUM or PRODUCT styles described in the
next section. The end result is the top-level FNP-table which
describes the complete header stack for the profile. This is suitable
input for the HH algorithm.
A.4.2. Step 2: Resolve complex encoding methods
Constructed FNP tables are used to represent more complex data
behavior. Complexity is introduced either by allowing more than one
behavior for a single item, or by considering the behavior of
multiple data items. These two methods correspond to two styles for
constructing FNP tables, which we refer to as SUM-style or PRODUCT-
style respectively. In each case the construction process implicitly
defines the order of the rows in the constructed table, based on the
information that describes the construction and the order of the rows
in each constituent table.
SUM-style construction is typically used to describe a variety of
possible behaviors of a single data item (e.g. 80% of the time
static, remaining 20% alternating between a set of values). SUM-
style construction takes as input a sequence {TABLE(i), P(i)}, where
each TABLE is a known FNP table for the data item, and each P(i) is
the unscaled probability that the behavior of the data item being
described by TABLE(i) occurs (so the sum over the sequence of P(i)
Price et al. [PAGE 33]
INTERNET-DRAFT EPIC November 17, 2000
is unity). The new table is constructed using the following
mechanism:
Initialize the constructed table to be empty
for each element i of the sequence {TABLE(i), P(i)}
{
for each row in TABLE(i)
{
Add a new row to the constructed table with
the following entries:
F = Format from row of TABLE(i)
N = corresponding entry for N from row of TABLE(i)
P = P(i)*corresponding entry for P from row of TABLE(i)
}
}
The number of codepoints in the constructed table is the sum of the
numbers of codepoints from all the constituent tables, and similarly
for the number of rows in the table.
The second type of construction is PRODUCT-style. This type of
construction is used to describe a data item composed of more than
one field. It takes as an input a simple sequence of {TABLE(i)},
where each TABLE(i) is a known FNP table, and there are NT tables in
all. The new table is constructed using the following mechanism:
Initialize the constructed table to be empty
for each row i1 of TABLE(1)
{
for each row i2 of TABLE(2)
{
...
for each row iNT of TABLE(NT)
{
Add a new row to the constructed table with the
following entries:
F = {Format i1 of TABLE(1), Format i2 of
TABLE(2), ... , Format iNT of TABLE(NT)}
N = [N(i1) of TABLE(i1)]* ... *[N(iNT) of TABLE(iNT)]
P = [P(i1) of TABLE(i1)]* ... *[P(iNT) of TABLE(iNT)]
}
...
}
}
The number of codepoints in the constructed table is the product of
the numbers of codepoints from all the constituent tables, and
similarly for the number of rows in the table.
The two types of construction steps can be repeated indefinitely to
build up progressively more complex data formats. The basis is always
the set of toolbox formats. The end result is known as the top-level
FNP table, which describes the whole protocol stack to be compressed.
Price et al. [PAGE 34]
INTERNET-DRAFT EPIC November 17, 2000
A.4.3. Step 3: Tree pruning
Rows are now pruned from the top-level FNP table and alignment is
enforced using the following procedure.
1. Sort the rows of the table in order of decreasing N*P. Where there
are ties in N*P, preserve the original ordering of the table.
2. If PTHRESH is given, discard all rows from the table for which N*P
< PTHRESH, and add the values of N*P to the variable PMISSING.
3. If MAXFORMATS is given, discard all rows from the table other than
the first MAXFORMATS-1, and add the values of N*P to PMISSING.
Then re-order the complete set of formats using the original
ordering of encodings (ignoring probability values).
4. Set AFORMATS = (number of rows remaining in the table) + 1, and
NPOINTS = sum of all N values for all rows in the table.
5. Create a new terminal row number AFORMAT, with
P = 0
N = (NPATTERNS - NPOINTS) mod (2^ALIGN-1)
The format indicator for this row is not used.
Note that in practice, it is not necessary to exhaustively generate
the completed table before pruning starts (i.e. some short cuts are
possible). However, for interoperability, care must be taken to
preserve the order that the standard construction method would
generate.
We complete this stage with a complete set of starting leaf nodes for
the tree, with low probability formats discarded. The nodes have
assigned N and P values and associated header format indicators.
A.4.4. Step 4: Creating the HH Tree
Creating a Hierarchical Huffman tree begins with a set of nodes, each
one having a number of codepoints (N) and a probability per codepoint
(P). During the creation of the tree new nodes may be created and
given a number of codepoints and a probability. These will then be
incorporated into the tree (i.e. the new nodes are incorporated into
the active set out of which the tree is growing, and old nodes which
are already part of the tree may be removed from this active set) The
tree building finishes when the active set consists of a single node
which contains all the codepoints and has probability 1.
The format indicator of each leaf node is not used for tree
construction (but instead must be stored for use in compression and
decompression).
We denote a node as NODE(k) for j=1, 2, 3 ...
NODE(k) has number of codepoints N(NODE(k))=n[k]
NODE(k) has probability per codepoint of P(NODE(k))=p[k]
Price et al. [PAGE 35]
INTERNET-DRAFT EPIC November 17, 2000
The initial nodes in the active set correspond to the formats from
the top-level FNP table: taking on their order, number of codepoints
and probabilities. Each codepoint corresponds to one possible header
within the given header format. Conventionally we place these leaf
nodes at the top of the tree and add new nodes working downwards.
Note that the total probability of NODE(k) is n[k]*p[k], and we start
with the total probability of all nodes in the active set being
1 - PMISSING. As new nodes are added to the set and old ones removed
or replaced, this property is conserved at each stage.
As the tree is built up, nodes are joined by branches. There are
three possible cases for branches between nodes at different levels:
i) A single vertical branch between a pair of nodes. Such a
branch is labeled {X}.
ii) A pair of branches joining one upper node to two lower ones.
These branches are labeled {D t} for some integer t
(different for each branch).
iii) Several branches each joining one upper node to a common
lower node. Each branch is labeled with {t} for some integer
t (different for each branch).
The tree generation algorithm is as follows:
1. Set NNODES = AFORMAT.
2. Find node j such that n[j] > 0 and p[j] is minimal (if there are
equalities then use smallest j). If there is no such j, STOP.
3. if (n[j] is divisible by 2^a)
{
/* Case A - Figure 4 */
i) replace NODE(j) in the active set with a new node j with
new p[j] = 2^ALIGN * p[j]
new n[j] = n[j] / 2^ALIGN
ii) join the new NODE(j) to original NODE(j)
iii)label branch with {X}
}
else /* n[j] not divisible by 2^a */
{
if (n[j] > 2^a)
{
/* Case B - Figure 5 */
i) create two new nodes, a replacement NODE(j) and NODE(r)
with r=NNODES+1, and set:
p[r] = p[j]
n[r] = n[j] modulo 2^ALIGN
new p[j] = p[j]
new n[j] = n[j] - n[r]
ii) join these nodes to original NODE(j) (left branch going
to new NODE(r) and right branch going to NODE(j))
iii)label branch to left with {D 0} and branch to right with
{D n[r]}
Price et al. [PAGE 36]
INTERNET-DRAFT EPIC November 17, 2000
iv) set NNODES=NNODES+1
}
else /* so n[j] < 2^a */
{
/* Case C - Figure 6 */
i) find nodes k[x], x=1, 2, 3...,b with the following
properties:
all k[x] different and k[x] != j;
p[k[x]] are minimal among the available nodes and
increasing with x;
if there are ties among p[k[x]], order the k[x] affected
by value of k[x];
b is minimal such that n[j]+n[k[1]]+...+n[k[b]]>2^ALIGN.
ii) define S[i] = n[j] + sum over (1<=x<=i-1) of n[k[x]]
iii) create two new nodes, a replacement NODE(k[b]) and
NODE(r) with r=NNODES+1, and set:
p[r] = p[k[b]]
n[r] = 2^ALIGN - S[b]
new p[k[b]] = p[k[b]]
new n[k[b]] = n[k[b]] - n[r]
iv) join NODE(k[b]) and NODE(r) to original NODE(k[b]) (left
branch going to NODE(r) and right branch going to
NODE(k[b]))
v) label branch to left with {D 0} and branch to right with
{D n[r]}
vi) create a replacement for NODE(j) and set
new n[j] = 1
new p[j] = n[j]*p[j] + n[r]*p[r] +
sum(1<= x<= b-1) of n[k[x]]*p[k[x]]
vii) join this node to the original NODE(j), to NODE(k[1]) to
NODE(k[b - 1]) and NODE(r)
viii)label branch to original NODE(j) with {0}
ix) for 1<=x<=b-1, label branch to NODE(k[x]) with {S[x]}
x) label branch to node r with {S[b]}
xi) remove NODE(k[1]) to NODE(k[b-1]) and NODE(r) from the
active set
xii) Set NNODES=NNODES+1
}
}
4. Return to 2
Former NODE(j)
{p, n}
|
| {X}
|
|
New NODE(j)
{p[j]=p*2^ALIGN, n[j]=n/2^ALIGN}
Figure 4 : Tree Building Case A
Price et al. [PAGE 37]
INTERNET-DRAFT EPIC November 17, 2000
Former NODE(j)
{p, n}
/\
/ \
/ \
/ \
/ \
{D 0}/ \{D n[r]}
/ \
/ \
/ \
NODE(r) New NODE(j)
{p[r]=p, n[r]=n mod 2^ALIGN} {n[j]=n-n[r], p[j]=p}
Figure 5 : Tree Building Case B
Former NODE(k[1])...NODE(k[b-1]) Former
NODE(j) | | NODE(K[b])
\ | | /\
\ | | / \
\ | | {D 0} / \ {D n[r]}
\ | | / \
\ | | / \
\ {S[1]}| |{S[b-1]} / \
\ | | NODE(r) New
\ | | {p, n} NODE(k[b])
\ | | /
{0} \ | | /
\ | | /
\ | | /
\ | | /
\ | | /{S[b]}
\ | | /
\ | | /
\ | | /
\ | | /
New NODE(j)
Figure 6 : Tree Building Case C
(See text for N and P values of created nodes)
The HH tree is now ready for use in the compressor and decompressor.
A.5. EPIC implementation issues
This section considers some issues that arise when implementing
EPIC.
A.5.1. Arithmetic
The implementation of all stages of the EPIC algorithm can be
performed using simple arithmetic operations on integer and floating
Price et al. [PAGE 38]
INTERNET-DRAFT EPIC November 17, 2000
point numbers. However, some aspects of the algorithm require care
when being implemented on a real microprocessor.
The compression/decompression front end works in terms of M values
which are described in the pseudocode as large integers. However, the
only operations required are very simple bit pattern manipulations:
- left and right shift by ALIGN bits;
- bitwise AND with (2^ALIGN-1);
- addition to, subtraction from, and comparison with "ordinary"
integers.
Given that ALIGN will typically be 8, this represents very limited
complexity and does not require specialized support in the data path.
The calculation of the Huffman tree is more involved, since the
numbers of codepoints can be large integers and the probability
values are conceptually floating point. The additional operations
required are the multiplication and addition of these numbers with
themselves and each other, and comparison of probability values.
Support for large integer multiplication is not particularly
problematic since it only has to be performed at tree generation time
(which could even be an off-line activity); however, if rounding
error leads to inconsistent floating point orderings then
interoperability may be compromised.
There are two solutions to this problem. Firstly, for a given (fixed)
profile it is possible to construct the tree and also propagate upper
bounds on the rounding error for the value at each node. It is then
normally possible to verify that rounding cannot have modified the
node selections made. This is particularly useful if tree pruning
measures are taken to eliminate header formats with very small
probabilities. Note that once the tree has been constructed the
probability (and indeed codepoint numbers) for the tree nodes are no
longer necessary, so this process can be carried out off-line if
necessary.
The second approach is to note that the only operations involving
floating point values are multiplication and addition. Therefore, if
the basic input probabilities (the scaled values in the fundamental
FNP tables of the toolbox, and P values from profiles) are expressed
explicitly as multiples of 1/2^m (for any m), the calculations can
be carried out exactly using just the large integer operations which
are required for the tree construction anyway. Note that scaled
probabilities such as 1/v for arbitrary v cannot be exactly
expressed in this way, which means that the summed N*P values for
that and derived tables will no longer be precisely unity. However,
this does not prevent the construction, only modifying the
validation checks that are appropriate at each stage. Noting that
this only occurs in the case of lookup-table encoding where the
value of v is small, profiles which adopt this approach should use
the convention that the probability will be taken as (exactly)
n/2^32, where n is the result of dividing 2^32 by v in integer
arithmetic.
Price et al. [PAGE 39]
INTERNET-DRAFT EPIC November 17, 2000
Appendix B. Example Profile for RTP/UDP/IPv4
This appendix describes an example profile for the compression of
RTP/UDP/IPv4.
Note that the probabilities listed for each encoding method are
initial estimates only. These need to be refined with more accurate
values from genuine RTP/UDP/IPv4 streams.
B.1. General parameters
ALIGN : 8
NPATTERNS : 224
MAXHEADERS : 1000
For simplicity, the lookup tables are initialized next to the
encoding methods in which they are used.
B.2. Field-specific parameters
The field-specific parameters are given below for UO-mode with a
sequential IP ID.
The modifications required to the inputset for a random IP ID are
described in Section B.3.
Each encoding method is listed in the form of a table as per Section
5.2.
RTP/UDP/IPv4-ENCODING:
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| CRC | IRREGULAR(3) | 99% |
| +----------------------------+-------------+
| | IRREGULAR(7) | 1% |
+------------------------+----------------------------+-------------+
| Master Sequence Number | LSB(4,-1) | 100% |
+------------------------+----------------------------+-------------+
| RTP Header | RTP-ENCODING | 100% |
+------------------------+----------------------------+-------------+
| UDP Header | UDP-ENCODING | 100% |
+------------------------+----------------------------+-------------+
| Inner IPv4 Header | IPv4-ENCODING | 100% |
+------------------------+----------------------------+-------------+
| Optional Extensions | EXTENSION-ENCODING | 100% |
+------------------------+----------------------------+-------------+
Notes:
The term "RTP Header" is shorthand for all of the fields in the RTP
header. A similar note applies for the "UDP Header" name etc.
Price et al. [PAGE 40]
INTERNET-DRAFT EPIC November 17, 2000
EXTENSION-ENCODING:
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| Inner Auth Header | AH-ENCODING | 99% |
| +----------------------------+-------------+
| | IRREGULAR(0) | 1% |
+------------------------+----------------------------+-------------+
| Inner ESP Header | ESP-ENCODING | 99% |
| +----------------------------+-------------+
| | IRREGULAR(0) | 1% |
+------------------------+----------------------------+-------------+
| Inner GRE Header | GRE-ENCODING | 99% |
| +----------------------------+-------------+
| | IRREGULAR(0) | 1% |
+------------------------+----------------------------+-------------+
| Outer IPv4 Header | OUTER-IPv4-ENCODING | 99% |
| +----------------------------+-------------+
| | IRREGULAR(0) | 1% |
+------------------------+----------------------------+-------------+
| Outer Auth Header | OUTER-AH-ENCODING | 99% |
| +----------------------------+-------------+
| | IRREGULAR(0) | 1% |
+------------------------+----------------------------+-------------+
| Outer ESP Header | OUTER-ESP-ENCODING | 99% |
| +----------------------------+-------------+
| | IRREGULAR(0) | 1% |
+------------------------+----------------------------+-------------+
| Outer GRE Header | OUTER-GRE-ENCODING | 99% |
| +----------------------------+-------------+
| | IRREGULAR(0) | 1% |
+------------------------+----------------------------+-------------+
Notes:
The optional headers in the protocol stack are treated in exactly the
same manner as mandatory headers, except that an additional encoding
method is needed to delete the optional header when it is no longer
needed.
Further information on this topic can be found in Section 6.2.2.
The minimal encapsulation header is not considered because it is not
covered by [ROHC5].
Price et al. [PAGE 41]
INTERNET-DRAFT EPIC November 17, 2000
IPv4-ENCODING: TTL{"",""}
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| Version | STATIC | 100% |
| Header Length | | |
| Reserved Flag | | |
| Last Fragment Flag | | |
| Fragment Offset | | |
| Source Address | | |
| Destination Address | | |
+------------------------+----------------------------+-------------+
| Packet Length | INFERRED | 100% |
| Header Checksum | | |
| Protocol | | |
+------------------------+----------------------------+-------------+
| May Fragment Flag | STATIC | 99.9% |
| +----------------------------+-------------+
| | IRREGULAR(1) | 0.1% |
+------------------------+----------------------------+-------------+
| Type of Service | STATIC | 99.9% |
| +----------------------------+-------------+
| | IRREGULAR(8) | 0.1% |
+------------------------+----------------------------+-------------+
| IP-ID | INFERRED | 99% |
| +----------------------------+-------------+
| | LSB(5,0) | 0.7% |
| +----------------------------+-------------+
| | LSB(8,0) | 0.2% |
| +----------------------------+-------------+
| | IRREGULAR(16) | 0.1% |
+------------------------+----------------------------+-------------+
| Time to Live | STATIC | 99% |
| +----------------------------+-------------+
| | READ(TTL,2) | 0.9% |
| +----------------------------+-------------+
| | WRITE(8,TTL,2) | 0.1% |
+------------------------+----------------------------+-------------+
| Network Byte Order | STATIC | 99.9% |
| +----------------------------+-------------+
| | IRREGULAR(1) | 0.1% |
+------------------------+----------------------------+-------------+
Notes:
The Header Length field is not static when IP options are used. The
inputset could be extended to handle this possibility if needed.
The inputset could also be extended to handle packet fragmentation if
required.
Price et al. [PAGE 42]
INTERNET-DRAFT EPIC November 17, 2000
The Network Byte Order (NBO) field is a Miscellaneous Information
field (see Section 7.3). Its use is described in Section 4.5.5 of
[ROHC5].
The Packet Length field is inferred from information provided by the
link layer.
The Header Checksum field is inferred by recalculating the IP
checksum at the decompressor.
The Protocol field is inferred from the next header in the chain.
The IP Identification field is inferred from the Master Sequence
Number.
UDP-ENCODING:
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| Source Port | STATIC | 100% |
| Destination Port | | |
+------------------------+----------------------------+-------------+
| UDP Length | INFERRED | 100% |
| UDP Checksum | | |
+------------------------+----------------------------+-------------+
| UDP Checksum Value | UNCOMPRESSED | 100% |
+------------------------+----------------------------+-------------+
Notes:
Even thought the UDP Checksum is always classed as INFERRED, it may
still be transmitted in full using the "UDP Checksum Value" field.
The UDP Checksum is then copied from this value.
The UDP Checksum Value field is a Miscellaneous Information field. If
context(UDP checksum) is zero at the compressor then the UDP Checksum
Value field is empty, otherwise it carries hdr(UDP Checksum) in full.
The length of the UDP Checksum Value field is inferred at the
decompressor from context(UDP Checksum). If this is zero then the UDP
Checksum Value field is empty, otherwise it is 16 bits long.
The UDP Length field is inferred from information provided by the
link layer.
The UDP Checksum field is inferred from the UDP Checksum Value field.
If this is empty then the UDP Checksum is zero, otherwise it is
copied in full from the UDP Checksum Value field.
Price et al. [PAGE 43]
INTERNET-DRAFT EPIC November 17, 2000
RTP-ENCODING:
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| Version | STATIC | 100% |
| Padding | | |
| Extension | | |
| SSRC | | |
+------------------------+----------------------------+-------------+
| CSRC Counter | INFERRED | 100% |
| Sequence Number | | |
| RTP Timestamp | | |
+------------------------+----------------------------+-------------+
| Payload Type | STATIC | 99.9% |
| +----------------------------+-------------+
| | IRREGULAR(7) | 0.1% |
+------------------------+----------------------------+-------------+
| RTP Marker | VALUE("0") | 90% |
| TS +----------------------------+-------------+
| | TALKSPURT | 10% |
+------------------------+----------------------------+-------------+
| Tsc | VALUE("0") | 99.9% |
| +----------------------------+-------------+
| | VALUE("1") | 0.1% |
+------------------------+----------------------------+-------------+
| TS Stride | STATIC | 99.9% |
| +----------------------------+-------------+
| | IRREGULAR(8) | 0.04% |
| +----------------------------+-------------+
| | IRREGULAR(16) | 0.03% |
| +----------------------------+-------------+
| | IRREGULAR(24) | 0.02% |
| +----------------------------+-------------+
| | IRREGULAR(32) | 0.01% |
+------------------------+----------------------------+-------------+
| Time Stride | STATIC | 99.9% |
| +----------------------------+-------------+
| | IRREGULAR(8) | 0.04% |
| +----------------------------+-------------+
| | IRREGULAR(16) | 0.03% |
| +----------------------------+-------------+
| | IRREGULAR(24) | 0.02% |
| +----------------------------+-------------+
| | IRREGULAR(32) | 0.01% |
+------------------------+----------------------------+-------------+
| CSRC Item 1 | CSRC-ENCODING(1) | 100% |
+------------------------+----------------------------+-------------+
| CSRC Item 2 | CSRC-ENCODING(2) | 100% |
+------------------------+----------------------------+-------------+
| ... | ... | ... |
+------------------------+----------------------------+-------------+
| CSRC Item 16 | CSRC-ENCODING(16) | 100% |
+------------------------+----------------------------+-------------+
Price et al. [PAGE 44]
INTERNET-DRAFT EPIC November 17, 2000
Notes:
Self-describing variable-length values are not needed by EPIC.
The TS, Tsc, TS Stride and Time Stride fields are all Miscellaneous
Information fields. They are used to reconstruct the RTP Timestamp at
the decompressor. Their use is explained in Section 4.5.3 and Section
4.5.4 of [ROHC5].
The CSRC Counter field is inferred by counting the number of non-
empty CSRC Item fields.
The Sequence Number field is inferred from the Master Sequence Number
by linear extrapolation.
The RTP Timestamp is inferred from the TS, Tsc, TS Stride and Time
Stride fields as in Section 4.5.3 and Section 4.5.4 of [ROHC5].
TALKSPURT:
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| RTP Marker | VALUE("0") | 80% |
| +----------------------------+-------------+
| | VALUE("1") | 20% |
+------------------------+----------------------------+-------------+
| TS | IRREGULAR(5) | 90% |
| +----------------------------+-------------+
| | IRREGULAR(8) | 9.9% |
| +----------------------------+-------------+
| | IRREGULAR(32) | 0.1% |
+------------------------+----------------------------+-------------+
Notes:
The RTP Marker and TS fields are given their own encoding method
because they behave in a dependent manner.
A talkspurt requires some LSBs of timestamp to be communicated so
that the correct timestamp value can be inferred from the wall clock.
The RTP marker is "1" at the beginning of each talkspurt, but is "0"
when the timestamp LSBs are resent for robustness.
CSRC-ENCODING(i): CSRCVALUEi{""}
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| CSRC Item i | STATIC | 100-a-b-c% |
| +----------------------------+-------------+
| | IRREGULAR(0) | a% |
| +----------------------------+-------------+
| | READ(CSRCVALUEi,1) | b% |
| +----------------------------+-------------+
| | WRITE(32,CSRCVALUEi,1) | c% |
+------------------------+----------------------------+-------------+
Price et al. [PAGE 45]
INTERNET-DRAFT EPIC November 17, 2000
Note: a = b = c = (17 - i) / 100
AH-ENCODING: COMMONLENGTHS{"00","03","04","05","06"}
AH-SPI{"","","","",""}
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| AH Next Header | INFERRED | 100% |
| AH Reserved | | |
+------------------------+----------------------------+-------------+
| AH Length | STATIC | 99.9% |
| +----------------------------+-------------+
| | READ(COMMONLENGTHS,5) | 0.09% |
| +----------------------------+-------------+
| | WRITE(8,COMMONLENGTHS,5) | 0.01% |
+------------------------+----------------------------+-------------+
| AH SPI | STATIC | 99.9% |
| +----------------------------+-------------+
| | READ(AH-SPI,5) | 0.099% |
| +----------------------------+-------------+
| | WRITE(32,AH-SPI,5) | 0.001% |
+------------------------+----------------------------+-------------+
| AH Sequence Number | INFERRED | 99% |
| +----------------------------+-------------+
| | LSB(8,-1) | 0.9% |
| +----------------------------+-------------+
| | IRREGULAR(32) | 0.1% |
+------------------------+----------------------------+-------------+
| AH Authentication Data | UNCOMPRESSED | 100% |
+------------------------+----------------------------+-------------+
Notes:
The initial values for the COMMONLENGHS lookup table are given in
hexadecimal. They represent the lengths of the AH Authentication Data
field for common hashing functions such as MD5, SHA-1 and so on.
If the AH SPI field is null (i.e. 0 bits long) then the AH header is
not present and hence all of the UNCOMPRESSED and INFERRED fields are
null. Otherwise:
The length of the AH Authentication Data field is known from the AH
Length field.
The AH Next Header field is inferred from the next header in the
extension chain.
The AH Reserved field is "0000000000000000".
The AH Sequence Number field is inferred from the Master Sequence
Number by linear extrapolation.
Price et al. [PAGE 46]
INTERNET-DRAFT EPIC November 17, 2000
ESP-ENCODING: PADDINGLENGTHS{0,0,0,0,0}
ESP-SPI{0,0,0,0,0}
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| ESP SPI | STATIC | 99.9% |
| +----------------------------+-------------+
| | READ(ESP-SPI,5) | 0.099% |
| +----------------------------+-------------+
| | WRITE(32,ESP-SPI,5) | 0.001% |
+------------------------+----------------------------+-------------+
| ESP Sequence Number | INFERRED | 99% |
| +----------------------------+-------------+
| | LSB(8,-1) | 0.9% |
| +----------------------------+-------------+
| | IRREGULAR(32) | 0.1% |
+------------------------+----------------------------+-------------+
| ESP Padding | INFERRED | 99.99% |
| +----------------------------+-------------+
| | UNCOMPRESSED | 0.01% |
+------------------------+----------------------------+-------------+
| ESP Padding Length | STATIC | 99.9% |
| +----------------------------+-------------+
| | READ(PADDINGLENGTHS,5) | 0.09% |
| +----------------------------+-------------+
| | WRITE(PADDINGLENGTHS,5) | 0.01% |
+------------------------+----------------------------+-------------+
| ESP Next Header | INFERRED | 100% |
+------------------------+----------------------------+-------------+
| ESP Auth Data | IRREGULAR(96) | 100% |
+------------------------+----------------------------+-------------+
Notes:
If the ESP SPI field is null then the ESP header is not present and
hence all of the UNCOMPRESSED and INFERRED fields are null.
Otherwise:
The length of the ESP Padding field is known from the ESP Padding
Length field.
The ESP Sequence Number field is inferred from the Master Sequence
Number by linear extrapolation.
The ESP Padding field has octets that are 1, 2, 3, ..., k where k is
the ESP Padding Length.
The ESP Next Header field is inferred from the next header in the
extension chain.
Price et al. [PAGE 47]
INTERNET-DRAFT EPIC November 17, 2000
GRE-ENCODING:
+------------------------+----------------------------+-------------+
| Field Name(s) | Encoding Method | Probability |
+------------------------+----------------------------+-------------+
| GRE Key Present | INFERRED | 100% |
| GRE Seq Num Present | | |
| GRE Flags | | |
| GRE Version | | |
| GRE Protocol | | |
| GRE Checksum | | |
+------------------------+----------------------------+-------------+
| GRE Checksum Present | STATIC | 99.9% |
| GRE Routing Present +----------------------------+-------------+
| GRE Strict Source Rte | IRREGULAR(1) | 0.1% |
+------------------------+----------------------------+-------------+
| GRE Recursion Control | STATIC | 99.9% |
| +----------------------------+-------------+
| | IRREGULAR(3) | 0.1% |
+------------------------+----------------------------+-------------+
| GRE Checksum Value | UNCOMPRESSED | 100% |
+------------------------+----------------------------+-------------+
| GRE Offset | STATIC | 99% |
| +----------------------------+-------------+
| | IRREGULAR(0) | 0.5% |
| +----------------------------+-------------+
| | IRREGULAR(16) | 0.5% |
+------------------------+----------------------------+-------------+
| GRE Key | STATIC | 99% |
| +----------------------------+-------------+
| | IRREGULAR(0) | 0.5% |
| +----------------------------+-------------+
| | IRREGULAR(32) | 0.5% |
+------------------------+----------------------------+-------------+
| GRE Sequence Number | INFERRED | 99% |
| +----------------------------+-------------+
| | LSB(8,-1) | 0.9% |
| +----------------------------+-------------+
| | IRREGULAR(32) | 0.1% |
+------------------------+----------------------------+-------------+
| GRE Routing | STATIC | 99.9% |
| +----------------------------+-------------+
| | UNCOMPRESSED | 0.1% |
+------------------------+----------------------------+-------------+
Notes:
The GRE Checksum Value field is a Miscellaneous Information field. If
context(GRE Checksum) is zero at the compressor then the GRE Checksum
Value field is empty, otherwise it carries hdr(GRE Checksum) in full.
If the GRE Recursion Control field is null then the GRE header is not
present and hence all of the INFERRED fields are null. Otherwise:
Price et al. [PAGE 48]
INTERNET-DRAFT EPIC November 17, 2000
The length of the GRE Checksum Value field is inferred at the
decompressor from context(GRE Checksum). If this is zero then the GRE
Checksum Value field is empty, otherwise it is 16 bits long.
The GRE Routing field contains its own built-in length indicator.
The GRE Flags field is "00000" and the GRE Version field is "000".
The GRE Sequence Number field is inferred from the Master Sequence
Number by linear extrapolation.
The GRE Protocol field is inferred from the next header in the
extension chain.
The GRE Key Present and GRE Sequence Number present fields are "0" if
their respective fields are null, otherwise they are "1".
If the Checksum Present and Routing Present fields are both "0" then
the GRE Checksum is null, else it is inferred from the GRE Checksum
Value field. If this is empty then the GRE Checksum is zero,
otherwise it is copied in full from the GRE Checksum value field.
OUTER-IPv4-ENCODING:
Identical to IPv4-ENCODING but for fields in the outer IP header.
Note that the NBO field is replaced by the NBO2 field.
OUTER-AH-ENCODING:
Identical to AH-ENCODING but for fields in the outer Authentication
Header.
OUTER-ESP-ENCODING:
Identical to ESP-ENCODING but for fields in the outer ESP header.
OUTER-GRE-ENCODING:
Identical to GRE-ENCODING but for fields in the outer GRE header.
B.3. Changes required for a random IP ID
If the IP ID changes randomly then a new inputset can be used with
following encoding:
+------------------------+----------------------------+-------------+
| IP-ID | IRREGULAR(16) | 100% |
+------------------------+----------------------------+-------------+
All other general and field-specific parameters are as above. Note
that the two inputsets define two different sets of compressed header
formats (one for sequential IP IDs, one for random IP IDs).
Since the change between a sequential and a random IP ID occurs only
rarely, there is no need to reserve space in the compressed headers
to indicate when to change between the two sets of compressed header
formats. Instead the change should be communicated using a STATIC or
DYNAMIC packet.
Price et al. [PAGE 49]
INTERNET-DRAFT EPIC November 17, 2000
Appendix C. Proof of Optimal Efficiency
A very important property of the compressed headers built by EPIC is
that they are optimally efficient. This means that the average
compressed header size of EPIC is provably the minimum possible for
the chosen protocol stack. The precise statement of optimal
efficiency is given below:
"If every compressed header format specified by EPIC is used with
the claimed probability, and if the distribution of headers within
each format is flat, then the EPIC compressed headers are optimally
efficient for the chosen level of robustness and word alignment".
Note that the two caveats in the statement just ensure that EPIC is
programmed correctly for the protocol stack it is compressing. If
the input to EPIC states that Field 1 will need LSB(4,0) encoding
for 20% of the time then optimal efficiency is only achieved if this
really is the case. Similarly each of the headers which make use of
a compressed header format must turn up equally often, otherwise the
extra non-randomness could be used to compress the headers even
further.
The proof of optimal efficiency is straightforward in the bit-
aligned case. Ordinary Huffman encoding works on a stream of
characters and provides optimal compression for each character in
turn as explained in [HUFF]. Since Hierarchical Huffman gives
semantically identical results the proof of optimality applies to it
as well. But EPIC treats an entire header as a single character in
the Hierarchical Huffman tree, so EPIC attains optimal compression
of every header in the chosen protocol stack.
In the general word-aligned case the compressed headers produced by
EPIC are optimally efficient for the chosen alignment (for example
in the octet-aligned case the average compressed header size is
minimal given that all headers must be a whole number of octets).
The proof can be easily adapted from the bit-aligned case.
Price et al. [PAGE 50]