Network Working Group Chia Yuan Cho
Internet-document Sukanta Kumar Hazra
Expires: August 2004
February 9, 2004
Statistical Inter-flow Field Behaviour
for Context Replication in ROHC-TCP
<draft-cho-rohc-tcp-interflow-behaviour-00.txt>
Status of This Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (C) The Internet Society (2004). All Rights Reserved.
Abstract
Context replication increases header compression gains by reducing
the redundancy between flows via efficient replicate (IR-CR) packets.
The optimum design of IR-CR packet formats requires elaborate
understanding of the inter-flow redundancy. As context replication is
most well-suited for TCP, this document presents a statistical
analysis of TCP/IP inter-flow field behaviour. Based on the analysis,
recommendations on ROHC-TCP packet format specifications for context
replication are made. It is also shown that inter-flow field
behaviour is inherently and significantly asymmetrical, and various
ways of handling it are considered. Finally, based on the inter-flow
behaviour of TCP Window field, it is noted that current encoding
methods do not compress it efficiently.
Cho & Hazra [Page 1]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
Table of contents
1. Introduction....................................................2
2. Terminology.....................................................3
3. Header Compression Model........................................4
4. Methodology.....................................................6
5. Results.........................................................9
5.1. IPv4 Identification......................................11
5.2. IP DonÆt Fragment and Time To Live.......................13
5.3. IP Destination Address...................................14
5.4. TCP Source Port..........................................15
5.5. TCP Destination Port.....................................16
5.6. TCP Sequence Number and Acknowledgement Number...........17
5.7. TCP Flags and Urgent Pointer.............................18
5.8. TCP Window...............................................18
5.9. TCP Checksum.............................................21
5.10. TCP Options..............................................21
5.11. Mean Sizes of Compressed Fields..........................21
6. Handling Asymmetrical Inter-flow Behaviour.....................22
7. Security Considerations........................................23
8. References.....................................................23
9. Authors' Addresses.............................................24
Appendix A. State Transition Threshold............................26
1. Introduction
Context replication offers an alternative to the conventional context
initialization procedure by performing context initialization via
more efficient IR-CR packets. In contrast to IR packets, which
contain mostly uncompressed fields, IR-CR packets carry compressed
header fields, obtained by reducing the redundancy between packets of
different flows. As such, header compression can possibly start right
from the first packet of a flow and compression efficiency is
improved.
The motivations for context replication, as well as elaborations on
the context replication mechanism are already in [ROHC-CR]. Although
context replication is a general ROHC mechanism, this document
focuses on the application of context replication to the ROHC-TCP
profile in particular. This is because the motivation for context
Cho & Hazra [Page 2]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
replication originated from the ROHC-TCP profile, and furthermore due
to TCP's æshort-lived' characteristic, context replication is able to
improve header compression gains most significantly for the ROHC-TCP
profile.
Context replication is possible due to significant redundancy between
multiple simultaneous, or near-simultaneous flows passing through the
same compressor-decompressor pair. For any header compression scheme
to work, the first step has to be towards understanding the field
behaviour to recognize areas of redundancy. The nature of context
relication focuses on relatively unexplored inter-flow field
behaviour, rather than well-understood intra-flow field behaviour. In
that aspect, [TCP-BEH] provides an elaborate qualitative analysis on
TCP/IP field behaviour. However, it has focused more on the intra-
flow aspect rather than the inter-flow aspect, for which this
document is meant in part as an extension. The difficulty in
understanding and describing inter-flow field behaviour is compounded
by the fact that it depends on human usage patterns, in addition to
the underlying protocol characteristics. This gives inter-flow field
behaviour a much larger variance and higher degree of uncertainty.
In this document, a method of extracting the inter-flow field
behaviour relevant for context replication is presented, as well as
the quantitative results of statistical analysis on the TCP/IP inter-
flow behaviour, based on four TCPdump traces containing 1.9 million
TCP/IP packet samples. From the results, a number of
recommendations are made. Firstly, the possibly optimum combination
of encoding methods to be used for each field during context
replication are recommended, as well as parameters and estimated
probabilities of success for each encoding method. Secondly, it is
shown that inter-flow field behaviour is significantly asymmetrical,
and ways of handling this behaviour are explored. Finally, it is
noted that current encoding methods can be improved upon to compress
the Window field more efficiently.
For verification of the replicate packet format specifications
prescribed in this document, the EPIC-LITE implementation [EPIC-IMPL]
from the University of Split was modified to support context
replication.
2. Terminology
This document reuses some of the terminology found in [RFC-3095],
[ROHC-TCP], [ROHC-CR], [TCP-BEH], [EPIC-LITE] and [ROHC-FN]. In
addition, this document defines the following terms:
'Incoming' and 'Outgoing' Packets
'Incoming' packets are packets traveling towards client hosts
through the channel of interest over which ROHC is employed.
Cho & Hazra [Page 3]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
'Outgoing' packets are packets traveling away from client hosts
through the channel of interest over which ROHC is employed.
Asymmetrical Header Compression
Header Compression is performed asymmetrically when 'incoming' and
'outgoing' packets are compressed differently. This requires the
packet format specifications for compressor-decompressor pairs to
be configured differently depending on the direction of packet
flow they deal with.
Replication Match Rate
The replication match rate for a trace is defined as the percentage
of uni-directional flows within the trace which can be context
replicated. A new flow is replicable when there is at least one
suitable base context present in the compressor upon arrival of the
first packet of the flow. This is used as a form of measure to
estimate the probability of using context replication for context
initialization.
State Transition Threshold
The State Transition Threshold for a uni-directional flow is the
number of initial TCP/IP packets (near the start of a flow)
converted into IR or IR-CR packets.
3. Header Compression Model
With the objective of extracting the TCP/IP inter-flow field
behaviour, we focus on the deployment of ROHC over the final hop. The
ROHC compressor-decompressor pair is deployed at the two endpoints of
the (possibly wireless) low-bandwidth channel and cooperates to
transmit packets efficiently in the direction towards the
decompressor. Since TCP requires a full-duplex channel, another
compressor-decompressor pair may be present to compress packets in
the reverse direction. Considering the direction of flow of packets
with respect to clients using the low-bandwidth channel, packets can
thus be classified as 'incoming' and 'outgoing'. 'Incoming' and
'outgoing' packets use different compressor-decompressor pairs. This
is shown in Fig. 1.
Although ROHC was originally targeted at cellular links, the
convergence of the telecommunication and computer communication
industries means that it may be employed over wireless links in
general. As such, the header compression model in Fig. 1 does not
define the target ælow-bandwidthÆ channel explicitly. Mobile Terminal
clients are connected to the Internet via a last-hop router node as
seen in Fig. 1, on which we focus on the æheader compression entityÆ
situated on the data link layer of the node. This can have
different manifestations depending on the nature of the wireless
Cho & Hazra [Page 4]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
+---+ 'outgoing'
| C |---
+---+ --- +-------+ +------+
| D |<-- --- | +---+ | -->|Server|
+---+ --- -->| | D | | - - - - - - - - - -- +------+
--- | +---+ | / \ --
'incoming' ---| | C | | | |<--
| +---+ | | | +------+
Clients | |<->| Internet |<-------->|Server|
| +---+ | | | +------+
'outgoing' -->| | D | | | |<--
--- | +---+ | \ / --
+---+ --- ---| | C | | - - - - - - - - - -- +-------+
| C |--- --- | +---+ | -->| Other |
+---+ --- +-------+ |Clients|
| D |<-- Last-hop +-------+
+---+ 'incoming' Router
|__________| |______________________|________|
Low-bandwidth Wired Wired or Wireless
Channel
C - Compressor
D - Decompressor
Fig. 1: Header compression model showing 'incoming' and 'outgoing'
flows
link. For example, in Universal Mobile Telecommunications System
(UMTS), the ROHC entity is part of the Packet Data Convergence
Protocol (PDCP) sub-layer on a Base Station; if ROHC is employed over
Wireless Ethernet (IEEE 802.11), it can be part of the data link
layer on a wireless router; in Mobile Ad Hoc networks, the ROHC
entity can reside on a æforwarding nodeÆ.
Due to the nature of the protocol suite under study, we expect
client-server computing to dominate over peer-to-peer, as is the case
currently. As such, 'incoming' and 'outgoing' flows are inherently
asymmetrical. As noted in [ROHC-TCP], some asymmetry is already
present in TCP/IP intra-flow field behaviour. An example is the
relationship between TCP Sequence Number and Acknowledgement Number,
for which 'outgoing' flows are likely to exhibit large deltas between
consecutive packets in Acknowledgement Number and small deltas in
Sequence Number, but the converse is likely for 'incoming' flows.
With respect to context replication, [ROHC-TCP] also acknowledges
some inter-flow asymmetry in the TCP source/destination port.
Cho & Hazra [Page 5]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
As will be shown in Section 5, asymmetry becomes even more pronounced
between flows. The above figure partly serves to illustrate that
asymmetrical header compression, if desired, can be achieved by
configuring compressor-decompressor pairs differently based on their
'incoming' or 'outgoing' role.
Finally, it should be noted that the focus on ROHC over the final hop
in Fig. 1 does not reduce the scope of applicability in the obtained
results on inter-flow behaviour. In general, header compression may
be deployed over any hop, e.g. over a core network links in Multiple
Protocol Label Switching (MPLS), or over intermediate hops in Mobile
Ad Hoc networks. Regardless of the location of ROHC deployment, the
TCP/IP endpoints remain the same. The advantage of focusing on the
last hop, then, is that it allows any asymmetrical behaviour to be
distilled. Bi-directional asymmetry over intermediate hops causes
inherent asymmetrical behaviour to be lost. However, over
intermediate hops, inter-flow results continue to be applicable using
the symmetric treatment as prescribed in Section 6.
4. Methodology
Given the bizarre range of inter-flow field behaviour, a suitable
methodology for obtaining inter-flow field behaviour relevant for
context replication is proposed.
Inter-flow field behaviour can be obtained by emulating a context-
replication enabled compressor. To observe any asymmetrical
behaviour, Tcpdump traces are fed into the æcompressor emulatorÆ
separately, according to the direction they flow, i.e. æincomingÆ or
æoutgoingÆ. Thus, the emulator simulates the compressors found on
client terminals and routers in the æoutgoingÆ and æincomingÆ
directions respectively. In the same way as a compressor, the
emulator creates, maintains and updates a list of contexts
dynamically for each arriving packet.
The emulator keeps an extensible list of contexts, one for each
unique TCP connection, arranged in a Most Recently Used (MRU) stack.
Each TCP/IP packet updates its context unique for that flow. A
context retrieved for updating or referencing is placed at the top
of stack, followed by its base context, if a base context has just
been simultaneously used as reference. Whenever possible, each new
flow is context replicated. Context replication is possible when a
base context exists, with the implementation-dependent selection
criteria requiring the IP source to be shared, and with preference
but no necessity for the same IP destination. For simplicity, all
contexts are assumed to be acknowledged by default. Furthermore, if
the first packet of a flow can be context replicated, then it is
assumed that the subsequent two packets of the flow would also be
Cho & Hazra [Page 6]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
replicated. This means that up to the first 3 packets of each flow
are converted into IR-CR packets. This number is the upper bound of
the State Transition Threshold range, and is based on the estimate of
the upper bound of TCP/IP packets possibly converted to IR-CR
packets. Elaboration on this would be done in Appendix A.
Even though we show results at the upper bound of the State
Transition Threshold, it was also found that the inter-flow field
behaviour remains invariant at smaller State Transition Threshold
values.
For the purpose of this study, four Tcpdump traces totaling 1.9
million packets were captured from within the Local Area Network of
the Institute for Infocomm Research. The LAN configuration is shown
in Fig. 2. Macro statistics of each trace are shown in the Table 1.
+--------+
| Client |
|Terminal|<-
+--------+ -
- +--------+
->|Last-Hop|
->| Router |<-
- +--------+ -
+--------+ - - +--------+
| Client |<- ->| NAT |
|Terminal| ->| Router |<-
+--------+ - +--------+ -
+--------+ - - +--------+
|Last-Hop|<- ->| Border |<->Internet
->| Router | ->|Gateway |
- +--------+ +--------+ - +--------+
- | NAT |<--
... <- ->| Router |
- +--------+
-
... <-
Fig. 2: Configuration of Local Area Network
Three out of four traces were captured at the Border Gateway, so that
traffic from a large number of client terminals can be gathered in
each single trace. However, as in most LANs, Network Address
Translation (NAT) is in use. NAT transparently changes æoutgoingÆ
Source IP Address and Port, as well as æincomingÆ Destination IP
Address and Port. Thus, packets captured at the Border Gateway
reflect the changed values rather than original values. To deal with
this, the forth trace TCP180903 captured at a client terminal was
used to investigate these fields as well as to verify results from
traces captured at the Border Gateway.
Cho & Hazra [Page 7]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
+---------------+-----------+------------+------------+-----------+
| Trace | TCP180803 | TCP080903a | TCP080903b | TCP180903 |
|Identification | | | | |
+---------------+-----------+------------+------------+-----------+
| Duration | 30 min | 30 min | 30 min | 27.4 hrs |
+---------------+-----------+------------+------------+-----------+
| Location | Gateway | Gateway | Gateway | Client |
| | Router | Router | Router | Terminal |
+---------------+-----------+------------+------------+-----------+
|No. of packets | 516172 | 509281 | 507293 | 383594 |
+---------------+-----------+------------+------------+-----------+
| Replication | 97.5 | 94.4 | 94.3 | 93.4 |
| Match Rate(%) | | | | |
+---------------+-----------+------------+------------+-----------+
Table 1: Macro statistics of TCPdump traces
By using packets captured from our LAN, it is assumed that TCP/IP
inter-flow field behaviour does not vary significantly between the
wired Ethernet-based channel and the target low bandwidth, possibly
less reliable channel where header compression takes place. Provided
the header compression layer is sufficiently robust to be
transparent, this is reasonable because the upper (network,
transport and application) layer protocol characteristics and human
usage behaviour remains the same.
It is desired that the inter-flow behaviour of TCP/IP fields are
mapped using a system of classification such that fields within a
category share the same characteristic. [TCP-BEH] already provides a
good system of classification for intra-flow field behaviour:
INFERRED, STATIC, STATIC-DEF, STATIC-KNOWN, CHANGING, where each
category follows some general trend(s) hinting how fields in that
category may be compressed. For inter-flow behaviour, [TCP-BEH] uses
a different system of classification: 'N/A/', 'No', 'Yes', which
unfortunately does not achieve the same level of effectiveness,
because one can only discern whether a field is compressible for
context replication, but does not know how to suitably compress it.
Therefore, in this document, the inter-flow field behaviour is
classified based on the same categories as used for intra-flow
behaviour: INFERRED, STATIC, STATIC-KNOWN, CHANGING. However, it
should be noted that the context here lies in inter-flow field
behaviour. Furthermore, here STATIC-DEF is merged into STATIC because
it is meaningless to define a STATIC category for fields defining a
packet stream where inter-flow field behaviour is concerned.
Classification can be done with the help of observing the range of
deltas. Here, delta is defined as the difference in field value
between that in the current packet and the stored field value in the
base context. The delta analysis is useful for the following reasons.
For any field not known to be INFERRED or STATIC-KNOWN, if delta = 0
in all samples, then this field is a STATIC field. If not, the field
Cho & Hazra [Page 8]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
is categorized as CHANGING. For CHANGING fields, by further analyzing
the range of deltas obtained, it can be found whether the field can
still be encoded using the STATIC encoding method with significant
probability. Since deltas tend to be small, the number of least
significant bits used (in LSB encoding) to encode that field with a
significant probability of success can be determined. Fields which
tend to have uniformly distributed deltas may only be suitably
encoded as IRREGULAR. Finally, where certain unique trends are
observed for a field, raw and/or network-byte-order converted
versions of field values are also studied.
5. Results
Our initial categorization is shown in Table 2. Differences between
intra-flow classification (in [TCP-BEH]) and inter-flow
classification here are marked with '(2)'. At this stage, there is no
asymmetry observed in categorization between æincoming and æoutgoingÆ
flows.
+-----------------------------------+------------+
| Field |Category |
+-----------------------------------+------------+
|IPv4 Version |STATIC |
|IPv4 Header Length |STATIC-KNOWN|
|IPv4 Type Of Service |STATIC(1) |
|IPv4 ECN Capable Transport |STATIC(1) |
|IPv4 Congestion Experienced |STATIC(1) |
|IPv4 Packet Length |INFERRED |
|IPv4 Identification |CHANGING |
|IPv4 Reserved Flag |STATIC(1) |
|IPv4 DonÆt Fragment Flag |CHANGING |
|IPv4 More Fragments Flag |STATIC-KNOWN|
|IPv4 Fragment Offset |STATIC-KNOWN|
|IPv4 Time To Live |CHANGING |
|IPv4 Protocol |STATIC |
|IPv4 Header Checksum |INFERRED |
|IPv4 Source Address |STATIC |
|IPv4 Destination Address |CHANGING(2) |
|TCP Source Port |CHANGING(2) |
|TCP Destination Port |CHANGING(2) |
|TCP Sequence Number |CHANGING |
|TCP Acknowledgement Number |CHANGING |
|TCP Data Offset |INFERRED |
|TCP Reserved |STATIC(1) |
|TCP Congestion Window Reduced |STATIC(1) |
|TCP Echo Congestion Experienced |STATIC(1) |
|TCP URG flag |CHANGING |
|TCP ACK flag |CHANGING |
|TCP PSH flag |CHANGING |
|TCP RST flag |CHANGING |
Cho & Hazra [Page 9]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
|TCP SYN flag |CHANGING |
|TCP FIN flag |CHANGING |
|TCP Window |CHANGING |
|TCP Checksum |CHANGING |
|TCP Urgent Pointer |CHANGING |
|TCP Options |CHANGING |
+-----------------------------------+------------+
(1)These fields were found to be STATIC from samples, but context
replication should follow the classification in [TCP-BEH] for
future-proofing.
(2)Differs from intra-flow classification [TCP-BEH] due to context
replication.
Table 2: TCP/IP Fields and Classifications
Some changes in categorization are made in this study because of the
current slow adoption of IP and TCP congestion notification fields.
However, these fields are expected to be used in the future and
should be CHANGING instead of STATIC.
The encoding methods to be used for STATIC, STATIC-KNOWN and INFERRED
fields are straightforward, but CHANGING fields need to be further
analyzed. This will be unraveled in subsequent sub-sections. CHANGING
fields can sometimes be encoded with STATIC, LSB, or other encoding
methods with significant probability. For LSB encoding, it is desired
to determine the suitable number of least significant bits to be used
to encode that field. Therefore, our frequency bins are defined in
increasing ceil(log2(|delta|+1)) (the reason for this expression
will be elaborated later in this section), which is effectively the
minimum number of bits possibly used to encode delta values within
that bin. Negative delta values are mapped to ûceil(log2(|delta|+1)),
and are useful for defining the offset value used in LSB encoding.
From our frequency tables, we can also derive the correct combination
of encoding methods to use, as well as the estimated probability of
each encoding method being used.
The inter-flow behaviour of CHANGING fields can be summarized
directly in the form of packet format specifications for IR-CR
packets. This is shown in Fig. 3, in EPIC-LITE terminology [EPIC-
LITE], which is derived from the BNF input language [RFC-2234]. To
illustrate asymmetrical inter-flow behaviour, packet format
specifications with any differences between 'incoming' and 'outgoing'
flows are defined separately for each field with the postfix ô_inö or
ô_outö. Note however that if the same set of encoding methods are
used in both directions for the same field, and only the
probabilities are different, then it may mean that significant
asymmetrical behaviour has not been observed.
Cho & Hazra [Page 10]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
Identification_in ::= NBO(16) ;network byte order
LSB(3,-1,50%) | LSB(8,-1,17%) | IRREGULAR(33%)
Identification_out ::= NBO(16)
LSB(3,-1,65%) | LSB(8,-1,14%) | IRREGULAR(21%)
DonÆt_Fragment_in ::= STATIC(73%) | IRREGULAR(1,27%)
DonÆt_Fragment_out ::= STATIC(99%) | IRREGULAR(1,1%)
Time_To_Live_in ::= STATIC(98%) | IRREGULAR(8,2%)
Time_To_Live_out ::= STATIC(97%) | IRREGULAR(8,3%)
Destination_Address_in ::= STATIC(100%)
Destination_Address_out ::= STATIC(86%) | IRREGULAR(32,14%)
Source_Port_in ::= STATIC(70%) | IRREGULAR(16,30%)
Source_Port_out ::= LSB(3,0,73%) | LSB(8,0,14%) | IRREGULAR(16,13%)
Destination_Port_in ::= LSB(3,0,73%) | LSB(8,0,14%) |
IRREGULAR(16,13%)
Destination_Port_out ::= STATIC(70%)| IRREGULAR(16,30%)
Sequence_Number ::= IRREGULAR(32,100%)
Acknowledgement_Number_in ::= IRREGULAR(32,100%)
Acknowledgement_Number_out ::= VALUE(32,0,33%) | IRREGULAR(32,67%)
URG_flag ::= IRREGULAR(1,100%)
ACK_flag ::= IRREGULAR(1,100%)
PSH_flag ::= IRREGULAR(1,100%)
RST_SYN_FIN_flag ::= VALUE(3,2,30%) | VALUE(3,0,65%) |
IRREGULAR(3,5%)
Urgent_Pointer ::= STATIC(99%) | IRREGULAR(16,1%)
Window_in ::= STATIC(30%)| IRREGULAR(16,70%)
Window_out ::= STATIC(43%) | IRREGULAR(16,57%)
Fig. 3. Packet format specifications for CHANGING fields.
In Fig. 3, specifications are expressed in the notation used by EPIC-
LITE instead of the Formal Notation [ROHC-FN] due to a number of
reasons. Firstly, basic encoding methods used in both remain the
same, and so EPIC-LITE expressions can be easily converted into
Formal Notation. Moreover, the equivalent of the 'multiple_packet_
formats' encoding method in ROHC-FN, used to specify multiple
encoding methods for a field, can be represented in a more compact
Cho & Hazra [Page 11]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
form using the OR operator, '|' in EPIC-LITE. Also, because EPIC-LITE
involves Huffman coding, it allows the expression of the probability
of each encoding method being successful as a parameter, which is
also useful for expressing the frequency of use of an encoding
method. Finally, it allows the packet format specifications to be
readily verified via context replication implementation in EPIC-LITE.
Details of the inter-flow behaviour of each CHANGING field are
elaborated in the following sub-sections.
5.1. IPv4 Identification
Table 3 shows the distribution of delta values in logarithmic scale.
Note that for delta > 0, the number of bits used to encode the delta
may be expressed as n = ceil(log2(|delta|+1)), as we are trying to
find the smallest n for which delta <= 2^n - 1. For delta < 0, the
equivalent mapping is n = -ceil(log2(|delta|+1)).
+--------+---------------+-----------+-----------+
|Encoded | Delta Range | Incoming | Outgoing |
|Bits,n | | Frequency | Frequency |
+--------+---------------+-----------+-----------+
|-16 |[-65535:-32768]| 6.0% | 2.3% |
|-15 |[-32767:-16384]| 4.5% | 2.1% |
|-14 |[-16383:-8192] | 2.4% | 2.1% |
|-13 |[-8191:-4096] | 1.5% | 0.8% |
|-12 |[-4095:-2048] | 0.7% | 0.6% |
|-11 |[-2047:-1024] | 0.3% | 0.3% |
|-10 |[-1023:-512] | 0.2% | 0.1% |
|-9 |[-511:-256] | 0.1% | 0.1% |
|-8 |[-255:-128] | 0.1% | 0.1% |
|-7 |[-127:-64] | 0.0% | 0.0% |
|-6 |[-63:-32] | 0.0% | 0.0% |
|-5 |[-31:-16] | 0.0% | 0.0% |
|-4 |[-15:-8] | 0.0% | 0.0% |
|-3 |[-7:-4] | 0.1% | 0.0% |
|-2 |[-3:-2] | 0.2% | 0.2% |
|-1 |[-1] | 0.6% | 0.4% |
|0 |[0] | 0.3% | 0.0% |
|1 |[1] | 23.4% | 33.7% |
|2 |[2:3] | 20.6% | 20.8% |
|3 |[4:7] | 6.6% | 10.5% |
|4 |[8:15] | 3.9% | 4.3% |
|5 |[16:31] | 3.6% | 3.3% |
|6 |[32:63] | 3.6% | 2.4% |
|7 |[64:127] | 3.4% | 2.0% |
|8 |[128:255] | 2.3% | 1.6% |
|9 |[256:511] | 2.3% | 1.2% |
Cho & Hazra [Page 12]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
|10 |[512:1023] | 1.7% | 1.2% |
|11 |[1024:2047] | 1.4% | 1.1% |
|12 |[2048:4095] | 0.9% | 1.0% |
|13 |[4096:8191] | 1.3% | 1.1% |
|14 |[8192:16383] | 2.5% | 2.3% |
|15 |[16384:3276] | 3.0% | 2.4% |
|16 |[32768:65535] | 2.4% | 1.9% |
+--------+---------------+-----------+-----------+
Table 3: Frequency distribution of Identification delta
Slightly asymmetrical behaviour can be observed from Table 3.
æIncomingÆ replicated packets are less likely to be encoded within 3
bits compared to æoutgoingÆ replicated packets. Moreover, æincomingÆ
delta values are more distributed, with higher occurrence of negative
deltas as well as deltas encodable between 6 to 10 bits. This is
reasonable because æincomingÆ replicated packets face larger deltas
due to busy servers handling multiple connections simultaneously or
near-simultaneously.
Inter-flow Identification deltas for æoutgoingÆ replicated packets
tend to be smaller than for æincomingÆ, as clients do not usually
maintain a large number of simultaneous or near-simultaneous TCP
connections.
It should be noted that Table 3 depicts network-byte-order corrected
Identification deltas. Typical implementation policies of IPv4
Identification increment are: sequential (increments by 1),
sequential-jump (typically increments by 256) and random. Linux based
implementations usually implements the sequential policy, and older
versions of Microsoft Windows usually implements the sequential-jump
policy with a jump size of 256. This is the equivalent of
incrementing the more significant byte of the two-byte Identification
field by 1. From a compression viewpoint, sequential-jump
implementations can be network-byte-order corrected at the compressor
end and reverted back to the original form at the decompressor end.
This approach has the advantage of compressing Identification fields
generated from both policies efficiently using the same encoding
method. A network byte order (NBO) flag is communicated to
differentiate between the two policies. Randomly incremented
Identification implementations cannot be efficiently compressed and
are sent as-is.
Current proposals for context replication compresses the
IPv4 Identification field into 0 or 16 bits, using VALUE and
IRREGULAR encoding methods respectively. The VALUE encoding method is
suitable for protocols like DHCP, and is not seen in Fig. 3 because
we are focusing on TCP/IP. However, it can be seen from the above
inter-flow behaviour that this field can also be compressed more
efficiently using LSB encoding, with recommended parameters as shown
in Fig. 3.
Cho & Hazra [Page 13]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
5.2. IP DonÆt Fragment Flag and Time to Live
The DF Flag is a single bit which may be set or unset. Although it
may be impractical to allow multiple encoding methods for a single
bit field, for the sake of characterizing its behaviour, STATIC and
IRREGULAR encoding methods are used. The IPv4 TTL (or equivalently,
IPv6 Hop Limit) is a 8-bit field which remains constant when the
route between the two endpoints is unchanged; when the route does
change due to congestion, it is better to simply send the field
uncompressed. Therefore, DF can be further analyzed in the same
category as TTL: we either encode them as STATIC, or uncompressed as
IRREGULAR. The actual probabilities associated with each encoding
method based on the samples is shown in Table 4.
+----------------+--------+-----------+
|Encoding Method | STATIC | IRREGULAR |
+----------------+--------+-----------+
| æIncomingÆ flows |
+-------------------------------------+
|DonÆt Fragment | 72.8% | 27.2% |
|Time To Live | 98.1% | 1.9% |
+----------------+--------+-----------+
| æOutgoingÆ flows |
+-------------------------------------+
|DonÆt Fragment | 98.5% | 1.5% |
|Time To Live | 96.9% | 3.1% |
+-------------------------------------+
Table 4: Percentage frequency of STATIC and IRREGULAR for DF and TTL
5.3. IP Destination Address
We have allowed for an implementation to use context replication
for scenarios where packets share at least the same Source IP
Address, but the Destination IP Address may be different. Therefore,
the Destination IP Address may be STATIC or IRREGULAR for these two
scenarios.
The proportion of IR-CR packets replicable due to the same/different
Destination IP Address is of interest. This determines how effective
the use of context replication to cover different IP Destination
Addresses can be. This proportion is tabulated in Table 5.
Cho & Hazra [Page 14]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
+------------+----------+-----------+
| | STATIC | IRREGULAR |
+------------+----------+-----------+
| 'Incoming' | 100.0% | 0.0% |
| 'Outgoing' | 85.8% | 14.2% |
+------------+----------+-----------+
Table 5: Percentage frequency of STATIC and IRREGULAR for IP
Destination Address.
As can be noted from Table 5, the results are skewed towards STATIC
(same Destination IP Address). This is because our emulator selects
the base context with preference for sharing the same Source and
Destination IP Address, although it is much easier to find contexts
sharing only the same Source IP Address. For some intervals, the
proportion of 'outgoing' IRREGULAR cases got as high as 48%.
Asymmetry is again observed to be inherent between æincomingÆ and
æoutgoingÆ flows. æIncomingÆ flows originating from Internet servers
are not likely to engage multiple common subnet clients within a
short period of time. However, the converse is true for æoutgoingÆ
flows, corresponding to prevalent usage patterns.
Our results also justify the virtue of an implementation which
considers context replication for cases even when the Destination IP
Address is different. This maximizes context replication efficiency
gains for æoutgoingÆ flows.
5.4. TCP Source Port
As can be seen from Table 6, clearly asymmetrical inter-flow
behaviour is observed for the TCP Source Port field. This behaviour
is seen mainly because ports at servers are well-known ports which
remain unchanged.
+---------------------------------------------+
|Encoded | Delta Range | Incoming |Outgoing |
|Bits,n | | Frequency|Frequency|
+--------+---------------+----------+---------+
|-16 |[-65535:-32768]| 0.0% | 0.0% |
|-15 |[-32767:-16384]| 0.0% | 0.0% |
|-14 |[-16383:-8192] | 0.0% | 0.0% |
|-13 |[-8191:-4096] | 0.0% | 0.3% |
|-12 |[-4095:-2048] | 5.8% | 0.2% |
|-11 |[-2047:-1024] | 1.8% | 0.6% |
|-10 |[-1023:-512] | 0.1% | 1.7% |
|-9 |[-511:-256] | 1.0% | 0.0% |
|-8 |[-255:-128] | 0.5% | 0.0% |
Cho & Hazra [Page 15]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
|-7 |[-127:-64] | 0.7% | 0.0% |
|-6 |[-63:-32] | 0.7% | 0.0% |
|-5 |[-31:-16] | 0.0% | 0.0% |
|-4 |[-15:-8] | 0.0% | 0.0% |
|-3 |[-7:-4] | 0.3% | 0.1% |
|-2 |[-3:-2] | 0.0% | 0.1% |
|-1 |[-1] | 0.0% | 2.3% |
|0 |[0] | 72.0% | 15.8% |
|1 |[1] | 0.0% | 31.9% |
|2 |[2:3] | 0.0% | 17.4% |
|3 |[4:7] | 0.0% | 7.8% |
|4 |[8:15] | 0.1% | 4.7% |
|5 |[16:31] | 0.1% | 3.3% |
|6 |[32:63] | 0.3% | 2.0% |
|7 |[64:127] | 0.3% | 3.0% |
|8 |[128:255] | 0.7% | 1.1% |
|9 |[256:511] | 0.8% | 2.7% |
|10 |[512:1023] | 3.0% | 3.2% |
|11 |[1024:2047] | 10.5% | 1.5% |
|12 |[2048:4095] | 1.2% | 0.1% |
|13 |[4096:8191] | 0.0% | 0.3% |
|14 |[8192:16383] | 0.0% | 0.0% |
|15 |[16384:3276] | 0.0% | 0.0% |
|16 |[32768:65535] | 0.1% | 0.0% |
+--------+---------------+----------+---------+
Table 6: Frequency distribution of Source Port delta
5.5. TCP Destination Port
The inter-flow behaviour of the TCP Destination Port field is shown
in Table 7. It can be observed that the trend is the opposite to
that of the TCP Source Port presented previously. This can be
accounted for obviously because the Destination Ports of æoutgoingÆ
packets are the Source Ports of replying æincomingÆ packets.
+--------+---------------+-----------+-----------+
|Encoded | Delta Range | Incoming | Outgoing |
|Bits,n | | Frequency | Frequency |
+--------+---------------+-----------+-----------+
|-16 |[-65535:-32768]| 0.0% | 0.0% |
|-15 |[-32767:-16384]| 0.0% | 0.0% |
|-14 |[-16383:-8192] | 0.0% | 0.0% |
|-13 |[-8191:-4096] | 0.3% | 0.0% |
|-12 |[-4095:-2048] | 0.0% | 0.4% |
|-11 |[-2047:-1024] | 0.0% | 4.1% |
|-10 |[-1023:-512] | 0.0% | 2.0% |
|-9 |[-511:-256] | 0.0% | 0.1% |
|-8 |[-255:-128] | 0.0% | 0.9% |
|-7 |[-127:-64] | 0.0% | 0.4% |
Cho & Hazra [Page 16]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
|-6 |[-63:-32] | 0.0% | 0.5% |
|-5 |[-31:-16] | 0.0% | 1.9% |
|-4 |[-15:-8] | 0.0% | 0.0% |
|-3 |[-7:-4] | 0.3% | 0.0% |
|-2 |[-3:-2] | 0.2% | 0.2% |
|-1 |[-1] | 6.8% | 0.0% |
|0 |[0] | 23.3% | 74.3% |
|1 |[1] | 33.4% | 0.0% |
|2 |[2:3] | 8.4% | 0.1% |
|3 |[4:7] | 6.9% | 0.0% |
|4 |[8:15] | 3.8% | 0.1% |
|5 |[16:31] | 2.8% | 0.1% |
|6 |[32:63] | 2.3% | 0.8% |
|7 |[64:127] | 3.4% | 0.2% |
|8 |[128:255] | 1.2% | 0.4% |
|9 |[256:511] | 2.7% | 0.8% |
|10 |[512:1023] | 2.4% | 2.1% |
|11 |[1024:2047] | 1.4% | 8.2% |
|12 |[2048:4095] | 0.0% | 1.8% |
|13 |[4096:8191] | 0.4% | 0.4% |
|14 |[8192:16383] | 0.0% | 0.1% |
|15 |[16384:3276] | 0.0% | 0.0% |
|16 |[32768:65535] | 0.0% | 0.0% |
+--------+---------------+-----------+-----------+
Table 7: Frequency distribution of Destination Port delta
5.6. TCP Sequence Number and Acknowledgement Number
The TCP Sequence Number (SEQNUM) cannot be replicated as the inter-
flow delta is random with a uniform probability density function,
regardless of the direction of flow. The TCP Acknowledgement Number
(ACKNUM) generally follows the randomness of SEQNUM, but a particular
behaviour can be exploited for compression of the first packet of
most æoutgoingÆ flows. All handshaking packets with SYN set but ACK
clear (the first packet of TCP connections) carry ACKNUM with zero
value. This is a behaviour unique to æoutgoingÆ flows because
service-requesting clients typically initiate the first packet within
TCP connections. The first æincomingÆ packet typically carries both
SYN and ACK set, and ACKNUM would be non-zero. Because up to the
third packet of each flow may be replicated, this represents at least
30% to 100% of all æoutgoingÆ replicated packets. Thus, ACKNUM can at
worst be compressed as shown in Fig. 3.
Alternatively, instead of basing the specifications on asymmetry, all
compressor-decompressor pairs can treat the SYN-set ACK-not-set case
as a flag to infer that the value of ACKNUM is 0. These fields are
already appropriately handled as prescribed in [ROHC-TCP].
Cho & Hazra [Page 17]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
5.7. TCP Flags and Urgent Pointer
ôTCP Flagsö refers to the TCP group of six flags: URG (Urgent), ACK
(Acknowledgement), PSH (Push), RST (Reset), SYN (Synchronize) and FIN
(Finish).
The URG flag was not found to be set in almost our entire sample,
i.e. it is much more likely to be 0 than 1. In some applications,
however, the URG flag may be used extensively. Thus, it can be
encoded as IRREGULAR(1,100%). The URG flag is also useful for
indicating the presence of the Urgent Pointer field. The compressor-
decompressor pair can treat this field as IRREGULAR when URG is set
and zero when URG is not set.
ACK is not set only in the first handshaking packet of all
connections (similar to ACKNUM), as well as in some minority packets
with RST set. Since the proportion of IR-CR packets carrying an unset
ACK can range from 33% to 100%, it should be sent as
IRREGULAR(1,100%).
PSH was found to be varying unpredictably between 0 and 1, and is
thus best left as IRREGULAR(1,100%).
There is high correlation between RST, SYN and FIN behaviour,
allowing them to be encoded together. RST and FIN are not set in
almost 100% of replicated packets. These three flags can
therefore encoded as: VALUE(3,2,30%) | VALUE(3,0,65%) |
IRREGULAR(3,5%). Equivalently, these three flags can also be
encoded as prescribed in [ROHC-TCP] using the ôindexö encoding
method, with FIN or RST exclusively set as the two other common
values.
5.8. TCP Window
Table 8 shows the delta distribution. For flows in both directions,
the main peak is at delta = 0, with amplitude 43% for æoutgoingÆ
replicated packets and 30% for æincomingÆ packets. We can encode
these cases with STATIC encoding.
+--------+---------------+-----------+-----------+
|Encoded | Delta Range | Incoming | Outgoing |
|Bits,n | | Frequency | Frequency |
+--------+---------------+-----------+-----------+
|-16 |[-65535:-32768]| 0.0% | 0.0% |
|-15 |[-32767:-16384]| 3.4% | 2.8% |
|-14 |[-16383:-8192] | 0.2% | 0.4% |
|-13 |[-8191:-4096] | 14.0% | 2.1% |
|-12 |[-4095:-2048] | 20.7% | 0.9% |
|-11 |[-2047:-1024] | 1.3% | 0.1% |
|-10 |[-1023:-512] | 6.6% | 1.7% |
Cho & Hazra [Page 18]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
|-9 |[-511:-256] | 4.4% | 2.3% |
|-8 |[-255:-128] | 4.1% | 0.8% |
|-7 |[-127:-64] | 0.6% | 2.6% |
|-6 |[-63:-32] | 0.4% | 1.2% |
|-5 |[-31:-16] | 0.2% | 0.7% |
|-4 |[-15:-8] | 0.1% | 0.5% |
|-3 |[-7:-4] | 0.1% | 0.1% |
|-2 |[-3:-2] | 0.2% | 0.0% |
|-1 |[-1] | 0.2% | 0.0% |
|0 |[0] | 30.4% | 43.2% |
|1 |[1] | 0.1% | 0.0% |
|2 |[2:3] | 0.1% | 0.1% |
|3 |[4:7] | 0.1% | 0.1% |
|4 |[8:15] | 0.1% | 0.2% |
|5 |[16:31] | 0.2% | 0.2% |
|6 |[32:63] | 0.1% | 0.8% |
|7 |[64:127] | 0.4% | 1.7% |
|8 |[128:255] | 0.2% | 3.4% |
|9 |[256:511] | 1.1% | 4.0% |
|10 |[512:1023] | 1.1% | 6.8% |
|11 |[1024:2047] | 2.0% | 3.0% |
|12 |[2048:4095] | 0.5% | 0.1% |
|13 |[4096:8191] | 2.3% | 0.3% |
|14 |[8192:16383] | 2.5% | 3.2% |
|15 |[16384:3276] | 0.1% | 3.5% |
|16 |[32768:65535] | 2.2% | 13.1% |
+--------+---------------+-----------+-----------+
Table 8: Frequency distribution of Window delta
Unlike other fields, Window delta values tend not to cluster
near the main peak. This is an expected behaviour. Naturally, LSB
would not be a suitable encoding method for the Window field. A
number of secondary peaks can be observed in Table 8, which suggests
that Windows tend to vary among a few discontinuous but commonly
used values.
We determine the most common Window values for æincomingÆ and
æoutgoingÆ flows separately and obtain a distribution of these
common Window values. This is shown in Table 9. It can
be observed again that asymmetry is inherent between æincomingÆ and
æoutgoingÆ flows. In this case, asymmetry is due to the use of a
different range of popular Window values between æincomingÆ and
æoutgoingÆ flows. æIncomingÆ advertised Window fields typically come
from HTTP servers sending data more than receiving data. Servers
typically advertise their receiver window conservatively and are slow
to grow their windows, to prevent data overloads from handling
multiple clients concurrently, and because of the congestion window
Cho & Hazra [Page 19]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
slow start algorithm [RFC-2581]. On the other
hand, sources of æoutgoingÆ traffic are normally clients downloading
data from servers. To utilize bandwidth efficiently, the advertised
window is usually large, usually right from the first packet. This is
consistent with recent proposals for increasing the TCP initial
Window size [RFC-3390].
+----------------------+----------------------+
| Incoming | Outgoing |
+--------+-------------+--------+-------------+
| Value | Probability | Value | Probability |
| | (%) | | (%) |
+--------+-------------+--------+-------------+
| 1380 | 1.1 | 1460 | 1.6 |
| 1460 | 23.5 | 2920 | 1.6 |
| 2760 | 1.3 | 8192 | 3.1 |
| 2920 | 22.2 | 8280 | 6.6 |
| 5840 | 2.2 | 16384 | 10.3 |
| 8280 | 11.7 | 16560 | 8.0 |
| 11680 | 4.9 | 64240 | 26.3 |
| 16384 | 6.9 | 64860 | 8.8 |
| 16560 | 2.1 | 65520 | 2.6 |
| 65535 | 4.6 | 65535 | 18.3 |
+--------+-------------+--------+-------------+
| Total | 80.4 | - | 87.2 |
+--------+-------------+--------+-------------+
Table 9: Common Window field values
The common values of the Window field, inclusive of all category
values found in Table 9, can be typically expressed as either (i) a
multiple of the Maximum Segment Size of the end-to-end channel, or
(ii) a raised power of 2, with possibly an offset of 1.
The Maximum Segment Size (MSS) is negotiated between both TCP
endpoints, through the TCP Options in TCP handshaking packets. The
negotiated MSS and is in turn derived from the IP Maximum Transfer
Unit (MTU) of the underlying network [RFC-1122]. The MTU over
Ethernet is 1500 bytes, or 1492 if used with Sub-network Attachment
Point (SNAP), or 1300 if used with PPP over Ethernet (for ADSL
links). Subtracting 40 bytes for TCP/IPv4 protocol stack, or 60 bytes
for the TCP/IPv6 protocol stack, or 120 bytes for maximum TCP/IP
header size, typically advertised MSS values are 1460, 1380, 1260,
1440 or 1452 bytes, in decreasing popularity. From the above set of
MSS values, 1460 and 1380 are used almost exclusively. Consequently,
almost all the Window values found in Table 9 can be expressed either
as multiples of 1460 or 1380. Exceptions are 8192, 16384, 65535,
which are raised powers of 2 with possibly offset of 1, and 65520,
which is a multiple of 1260.
Cho & Hazra [Page 20]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
Thus, commonly used Window values not expressible as multiples
of the MSS values are raised powers of 2 with possibly an offset of
1. From Table 9, 8192, 16384 and 65535 are 2^13, 2^14 and 2^16 - 1
respectively.
Also, the TCP Window is always 0 when RST (Reset flag) is set.
Therefore, the decompressor can infer the Window value whenever
RST is set and there is no need to send it.
The TCP Window field is used in both congestion and flow
control. The use of congestion control can account partly for the
commonly used values discussed above, as congestion control changes
are in multiples of the MSS. However, values due to flow control do
not follow the pattern discussed above but are typically small
offsets from the above commonly used values.
Currently, the Window field is either encoded as STATIC or IRREGULAR
for context replication [ROHC-TCP]. The above observations illustrate
that current use of encoding methods do not sufficiently make use of
the unique behaviour of the Window field. It also provides the
motivation for devising a more efficient way of encoding the Window
field. This encoding method is elaborated upon in [TCP-WIN].
5.9. TCP Checksum
The TCP Checksum field covers the pseudo-header, payload and TCP
header, and varies between packets. Although ROHC packets may contain
a CRC field, the CRC does not cover the payload. Since it is
important to preserve data integrity, the Checksum field is sent
uncompressed as IRREGULAR (16,100%).
5.10. TCP Options
TCP options contain a wide variety of optional fields, but commonly
used options include the MSS, Window Scale and SACK-Permitted found
in handshaking packets. These fields do not change between replicated
packets and can thus be compressed efficiently as STATIC for context
replication.
5.11. Mean Sizes of Compressed Fields
Table 10 shows the TCP/IP fields found in æincomingÆ IR-CR packets
and calculates the mean sizes of their encoded forms. Compressed
TCP/IP fields take up a mean size of 107.3 bits for æincomingÆ flows.
By repeating the calculation based on æoutgoingÆ packet format
specifications, it can be shown that the mean æoutgoingÆ IR-CR size
is 97.5 bits.
Cho & Hazra [Page 21]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
+---------------------+------+--------------------------+-------+
| | Size | Encoded size (bits) & | Mean |
| Field | | probability |Encoded|
| |(bits)| | Size |
| | | | (bits)|
+---------------------+------+--------------------------+-------+
|IPv4 Identification | 16 | 3(50%) | 8(17%) | 16(33%)| 8.14 |
|IPv4 DonÆt Fragment | 1 | 0(73%) | 1(27%) | 0.27 |
|IPv4 Time To Live | 8 | 0(98%) | 8(2%) | 0.16 |
|IPv4 Dest. Address | 32 | 0(98%) | 32(2%) | 0.64 |
|TCP Source Port | 16 | 0(70%) | 16(30%) | 4.80 |
|TCP Dest. Port | 16 | 3(73%) | 8(14%) | 16(13%)| 5.39 |
|TCP Sequence Number | 32 | 32(100%) | 32 |
|TCP Ack. Num | 32 | 32(100%) | 32 |
|TCP flags | 8 | 2(95%) | 5(5%) | 2.15 |
|TCP Window | 16 | 0(30%) | 6(47%) | 4(8%) | 5.54 |
| | | | 16(15%) | |
|TCP Checksum | 16 | 16(100%) | 16 |
|TCP Urgent Pointer | 16 | 0(99%) | 16(1%) | 0.16 |
+---------------------+------+--------------------------+-------+
|TOTAL | 209 | - | 107.3 |
+---------------------+------+--------------------------+-------+
Table 10: Mean Encoded Sizes of æincomingÆ TCP/IP Fields
6. Handling Asymmetrical Inter-flow Behaviour
From the previous section, and as summarized in Fig. 3, some TCP/IP
fields exhibit inherently asymmetrical behaviour. The issue, then, is
to explore various ways of handling such asymmetrical behaviour such
that the gain versus complexity tradeoff can be optimized.
As observable from the header compression model in Fig. 1 and
asymmetrical packet format specifications in Fig. 3, asymmetrical
inter-flow behaviour can be handled by asymmetrical header
compression. This can be done by configuring compressor-decompressor
using a different set of packet format specifications, based on their
'incoming' or 'outgoing' role. While this treatment has the highest
compression efficiency, its main disadvantage is that it may be more
complicated than symmetrical header compression.
Alternatively, asymmetrical behaviour can also be handled using
symmetrical packet format specifcations, by expanding the use of the
'multiple_packet_formats' encoding method [ROHC-FN] to cover
asymmetrical behaviour, at the cost of using a few more
'discriminator bits'. This is the methodology being adopted in
current ROHC drafts.
Cho & Hazra [Page 22]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
From Fig. 3, the fields exhibiting significant asymmetrical behaviour
are the IP Destination Address, TCP Source Port, Destination Port and
Acknowledgement Number. (The behaviour of TCP Window is in fact
also asymmetrical, but asymmetry cannot be expressed using current
encoding methods) To handle these fields symmetrically, the following
packet format specifications can be used instead:
Destination_Address ::= STATIC(.) | IRREGULAR(32,.) %1 discriminator
% bit
Source_Port ::= STATIC(.) | LSB(3,0,.) | LSB(8,0,.) |
IRREGULAR(16,.) %2 discriminator bits
Destination_Port ::= STATIC(.) | LSB(3,0,.) | LSB(8,0,.) |
IRREGULAR(16,.) %2 discriminator bits
Acknowledgement_Number ::= VALUE(32,0,.) | IRREGULAR(32,.)
%1 discriminator bit
Fig. 4: Symmetrical packet format specifications for fields with
asymmetrical behaviour
The asymmetrical behaviour of Window field may be handled
efficiently using a proposed encoding method as elaborated in [TCP-
WIN]. This encoding method can be either symmetrical or asymmetrical.
7. Security Considerations
This document does not bring any new additional security
considerations.
8. References
[RFC-3390] Allman, M., Floyd, S., Partridge, C.,. ôIncreasing TCPÆs
Initial Windowö, RFC 3390, October 2002.
[RFC-3095] Bormann, C., Burmeister, C., Degermark, M., Fukushima,
H., Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T.,
Le, K., Liu, Z., Martensson, A., Miyazaki, A., Svanbro,
K., Wiebke, T., Yoshimura, T. and H. Zheng, "RObust
Header Compression (ROHC): Framework and four profiles:
RTP, UDP, ESP, and uncompressed", RFC 3095, July 2001.
[RFC-2581] Allman, M., Paxon, V., Stevens, W., ôTCP Congestion
Controlö, RFC 2581, April 1999.
[RFC-2234] Crocker D, et al, "Augmented BNF for Syntax
Specifications: ABNF", RFC 2234, 1997.
Cho & Hazra [Page 23]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
[RFC-1122] R. Braden, Editor, ôRequirements for Internet Hosts û
Communication Layersö, RFC 1122, 1989.
[ROHC-TCP] Pelletier, G., Zhang, Q., Jonsson, L-E., Liao, H., West,
M., "RObust Header Compression (ROHC): TCP/IP Profile
(ROHC-TCP)", Internet Draft (work in progress), <draft-
ietf-rohc-tcp-04.txt>, May 2003.
[TCP-BEH] West, M. and S. McCann, "TCP/IP Field behavior", Internet
Draft (work in progress), <draft-ietf-rohc-tcp-field-
behavior-02.txt>, March 2003.
[ROHC-CR] Pelletier, G., "RObust Header Compression (ROHC): Context
Replication for ROHC Profiles", Internet Draft (work in
progress), <draft-ietf-rohc-context-replication-01.txt>,
October 2003.
[ROHC-FN] "Formal Notation for Robust Header Compression
(ROHC-FN)", R. Price et al., <draft-ietf-rohc-formal-
notation-01.txt> (work in progress), March 2003
[EPIC-LITE] Price, R., Hancock, R., McCann, S., Surtees, A., Ollis,
P., West, M., "Framework for EPIC-LITE", Internet Draft
(work in progress), <draft-ietf-rohc-epic-lite-01.txt>,
2002.
[EPIC-IMPL] L. Vidjak, M. Stula, J. Ozegovic, "Program Structures
for EPIC-LITE Experimental Implementation", SoftCOM 2002.
[TCP-WIN] Cho, C.Y., Hazra, S.K., ôEncoding Method for TCP Window
in Context Replicationö, Internet Draft, to be submitted.
9. Authors' Addresses
Chia Yuan Cho
Institute for Infocomm Research (I2R)
21 Heng Mui Keng Terrace
Singapore 119613
Phone: +65 6874 6643
Email: stucyc2@i2r.a-star.edu.sg
Cho & Hazra [Page 24]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
Sukanta Kumar Hazra
Institute for Infocomm Research (I2R)
21 Heng Mui Keng Terrace
Singapore 119613
Phone: +65 6874 1953
Email: sukanta@i2r.a-star.edu.sg
Cho & Hazra [Page 25]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
Appendix A. State Transition Threshold
The aim of this section is to determine a reasonable range for the
number of initial TCP/IP packets possibly converted into IR or IR-CR
packets, which is defined as the State Transition Threshold.
The compressor state machine controls the type of packet transmitted
to the decompressor. As elaborated in [ROHC-TCP], transition from the
CR state to CO state at the compressor is initiated optimistically or
explicitly through reception of an ROHC ACK from the decompressor.
Because at least 1 IR/IR-CR packet must be sent before state
transition, the State Transition Threshold, H is such that H: H >= 1.
The State Transition Threshold is different from simply the number of
context initializing IR/IR-CR packets sent because in uni-directional
mode or optimistic bidirectional mode, a single TCP/IP packet may be
sent as a number of duplicate IR/IR-CR packets (To allow the
compressor to gain the optimistism necessary for upwards transition).
A range of suitable values for H is derived the protocol stack nature
and channel characteristics. For the TCP/IP protocol stack, we begin
by looking at the first few packets exchanged for a TCP connection.
Fig. 4 shows a TCP connection using TCP/IP header compression over a
low-bandwidth channel. Packets in the forward direction are numbered.
The first TCP packet is always converted into an IR/IR-CR packet. In
the following analysis, we focus on the compressor at the client and
the decompressor at the router.
Suppose the channel is full-duplex, and an ROHC ACK is sent upon the
successful decompression of the first packet. ROHC ACKs may be
piggybacked. The earliest possible ROHC ACK sent is indicated in Fig.
4 as a dotted arrow. When the compressor receives the ROHC ACK, it
transits from IR/CR to CO state. Subsequently, it starts sending CO
packets instead. If the channel is reliable, then the compressor
receives its ROHC ACK before it sends the second TCP/IP packet and
only a single TCP/IP packet becomes an IR/IR-CR packet, i.e. H = 1.
This is also likely if the router-server RTT >> client-router RTT,
for which case even if the first ROHC ACK is lost, the compressor may
be offered ample opportunity to receive retransmitted ROHC ACKs
before it sends the packet #2. Conversely, if the channel is
unreliable, and/or if client-router RTT >> router-server RTT (as is
likely the case for cellular links), then it is likely that the ROCH
ACK is not received immediately and subsequent TCP/IP packets are
still sent as IR-CR packets. However, as seen from Fig. 4, the time
lapse between TCP/IP packet #1 and packet #4 is long compared to all
subsequent packets (when the TCP sliding window mechanism kicks in),
and it is reasonable to assume that the ROHC ACK is received before
packet #4 is sent. Thus, a reasonable range is 1 <= H <= 3.
Cho & Hazra [Page 26]
Internet-document Statistical Inter-flow Field Behaviour February 2004
for Context Replication in ROHC-TCP
Client Router Server
| | |
SYN |--- #1 | |
| --- | |
| -->|--- |
| ...| --- |
| ... | -->|
+-- ROHC ACK |<.. | ---| SYN,ACK
| (best case) | | --- |
| | ---|<-- |
| | --- | |
| |<-- | |
| ACK |--- #2 | |
| | --- | |
| request |--- #3-->|--- |
| | --- | --- |
| | -->|--- -->|
| large | | --- |
| time | | -->|
| lapse | | ---| reply
| | | --- |
| | ---|<-- |
| | --- | |
+--(worst case)|<-- | |
|--- #4 | |
| --- | |
| -->|--- |
| | --- |
| | -->|
Compressor Decompressor
|_________|_________|
Low Wired
Bandwidth or
Channel Wireless
Fig. 4: TCP handshaking and ROHC ACKs
Finally, because TCP/IP contains bi-directional traffic, header
compression may occur in both directions and in this case the overall
state transition threshold is Ho = 2H. For uni-directional protocol
stacks like RTP/UDP/IP, the overall state transition threshold Ho
remains at H.
Cho & Hazra [Page 27]