Network Working Group                                     Chia Yuan Cho
Internet-document                                   Sukanta Kumar Hazra
Expires: August 2004

                                                       February 9, 2004

                 Statistical Inter-flow Field Behaviour
                  for Context Replication in ROHC-TCP
             <draft-cho-rohc-tcp-interflow-behaviour-00.txt>

Status of This Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is  inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Copyright Notice

   Copyright (C) The Internet Society (2004).  All Rights Reserved.

Abstract

   Context replication increases header compression gains by reducing
   the redundancy between flows via efficient replicate (IR-CR) packets.
   The optimum design of IR-CR packet formats requires elaborate
   understanding of the inter-flow redundancy. As context replication is
   most well-suited for TCP, this document presents a statistical
   analysis of TCP/IP inter-flow field behaviour. Based on the analysis,
   recommendations on ROHC-TCP packet format specifications for context
   replication are made. It is also shown that inter-flow field
   behaviour is inherently and significantly asymmetrical, and various
   ways of handling it are considered. Finally, based on the inter-flow
   behaviour of TCP Window field, it is noted that current encoding
   methods do not compress it efficiently.




Cho & Hazra                                                     [Page 1]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


Table of contents

   1.  Introduction....................................................2

   2.  Terminology.....................................................3

   3.  Header Compression Model........................................4

   4.  Methodology.....................................................6

   5.  Results.........................................................9

       5.1.  IPv4 Identification......................................11
       5.2.  IP DonÆt Fragment and Time To Live.......................13
       5.3.  IP Destination Address...................................14
       5.4.  TCP Source Port..........................................15
       5.5.  TCP Destination Port.....................................16
       5.6.  TCP Sequence Number and Acknowledgement Number...........17
       5.7.  TCP Flags and Urgent Pointer.............................18
       5.8.  TCP Window...............................................18
       5.9.  TCP Checksum.............................................21
       5.10. TCP Options..............................................21
       5.11. Mean Sizes of Compressed Fields..........................21

   6.  Handling Asymmetrical Inter-flow Behaviour.....................22

   7.  Security Considerations........................................23

   8.  References.....................................................23

   9.  Authors' Addresses.............................................24

   Appendix A.  State Transition Threshold............................26


1.  Introduction

   Context replication offers an alternative to the conventional context
   initialization procedure by performing context initialization via
   more efficient IR-CR packets. In contrast to IR packets, which
   contain mostly uncompressed fields, IR-CR packets carry compressed
   header fields, obtained by reducing the redundancy between packets of
   different flows. As such, header compression can possibly start right
   from the first packet of a flow and compression efficiency is
   improved.

   The motivations for context replication, as well as elaborations on
   the context replication mechanism are already in [ROHC-CR]. Although
   context replication is a general ROHC mechanism, this document
   focuses on the application of context replication to the ROHC-TCP
   profile in particular. This is because the motivation for context


Cho & Hazra                                                     [Page 2]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   replication originated from the ROHC-TCP profile, and furthermore due
   to TCP's æshort-lived' characteristic, context replication is able to
   improve header compression gains most significantly for the ROHC-TCP
   profile.

   Context replication is possible due to significant redundancy between
   multiple simultaneous, or near-simultaneous flows passing through the
   same compressor-decompressor pair. For any header compression scheme
   to work, the first step has to be towards understanding the field
   behaviour to recognize areas of redundancy. The nature of context
   relication focuses on relatively unexplored inter-flow field
   behaviour, rather than well-understood intra-flow field behaviour. In
   that aspect, [TCP-BEH] provides an elaborate qualitative analysis on
   TCP/IP field behaviour. However, it has focused more on the intra-
   flow aspect rather than the inter-flow aspect, for which this
   document is meant in part as an extension. The difficulty in
   understanding and describing inter-flow field behaviour is compounded
   by the fact that it depends on human usage patterns, in addition to
   the underlying protocol characteristics. This gives inter-flow field
   behaviour a much larger variance and higher degree of uncertainty.

   In this document, a method of extracting the inter-flow field
   behaviour relevant for context replication is presented, as well as
   the quantitative results of statistical analysis on the TCP/IP inter-
   flow behaviour, based on four TCPdump traces containing 1.9 million
   TCP/IP packet samples. From the results, a number of
   recommendations are made. Firstly, the possibly optimum combination
   of encoding methods to be used for each field during context
   replication are recommended, as well as parameters and estimated
   probabilities of success for each encoding method. Secondly, it is
   shown that inter-flow field behaviour is significantly asymmetrical,
   and ways of handling this behaviour are explored. Finally, it is
   noted that current encoding methods can be improved upon to compress
   the Window field more efficiently.

   For verification of the replicate packet format specifications
   prescribed in this document, the EPIC-LITE implementation [EPIC-IMPL]
   from the University of Split was modified to support context
   replication.


2.  Terminology

   This document reuses some of the terminology found in [RFC-3095],
   [ROHC-TCP], [ROHC-CR], [TCP-BEH], [EPIC-LITE] and [ROHC-FN]. In
   addition, this document defines the following terms:

   'Incoming' and 'Outgoing' Packets
     'Incoming' packets are packets traveling towards client hosts
     through the channel of interest over which ROHC is employed.



Cho & Hazra                                                     [Page 3]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


     'Outgoing' packets are packets traveling away from client hosts
     through the channel of interest over which ROHC is employed.

   Asymmetrical Header Compression
     Header Compression is performed asymmetrically when 'incoming' and
     'outgoing' packets are compressed differently. This requires the
     packet format specifications for compressor-decompressor pairs to
     be configured differently depending on the direction of packet
     flow they deal with.

   Replication Match Rate
     The replication match rate for a trace is defined as the percentage
     of uni-directional flows within the trace which can be context
     replicated. A new flow is replicable when there is at least one
     suitable base context present in the compressor upon arrival of the
     first packet of the flow. This is used as a form of measure to
     estimate the probability of using context replication for context
     initialization.

   State Transition Threshold
     The State Transition Threshold for a uni-directional flow is the
     number of initial TCP/IP packets (near the start of a flow)
     converted into IR or IR-CR packets.


3. Header Compression Model

   With the objective of extracting the TCP/IP inter-flow field
   behaviour, we focus on the deployment of ROHC over the final hop. The
   ROHC compressor-decompressor pair is deployed at the two endpoints of
   the (possibly wireless) low-bandwidth channel and cooperates to
   transmit packets efficiently in the direction towards the
   decompressor. Since TCP requires a full-duplex channel, another
   compressor-decompressor pair may be present to compress packets in
   the reverse direction. Considering the direction of flow of packets
   with respect to clients using the low-bandwidth channel, packets can
   thus be classified as 'incoming' and 'outgoing'. 'Incoming' and
   'outgoing' packets use different compressor-decompressor pairs. This
   is shown in Fig. 1.

   Although ROHC was originally targeted at cellular links, the
   convergence of the telecommunication and computer communication
   industries means that it may be employed over wireless links in
   general. As such, the header compression model in Fig. 1 does not
   define the target ælow-bandwidthÆ channel explicitly. Mobile Terminal
   clients are connected to the Internet via a last-hop router node as
   seen in Fig. 1, on which we focus on the æheader compression entityÆ
   situated on the data link layer of the node. This can have
   different manifestations depending on the nature of the wireless




Cho & Hazra                                                     [Page 4]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


    +---+  'outgoing'
    | C |---
    +---+   ---      +-------+                                +------+
    | D |<--   ---   | +---+ |                             -->|Server|
    +---+   ---   -->| | D | |    - - - - - - - - -      --   +------+
               ---   | +---+ |   /                 \   --
      'incoming'  ---| | C | |   |                 |<--
                     | +---+ |  |                   |         +------+
    Clients          |       |<->|    Internet     |<-------->|Server|
                     | +---+ |  |                   |         +------+
      'outgoing'  -->| | D | |   |                 |<--
               ---   | +---+ |   \                 /   --
    +---+   ---   ---| | C | |    - - - - - - - - -      --   +-------+
    | C |---   ---   | +---+ |                             -->| Other |
    +---+   ---      +-------+                                |Clients|
    | D |<--          Last-hop                                +-------+
    +---+ 'incoming'   Router
         |__________|         |______________________|________|

         Low-bandwidth                  Wired        Wired or Wireless
           Channel

     C - Compressor
     D - Decompressor

   Fig. 1: Header compression model showing 'incoming' and 'outgoing'
   flows


   link. For example, in Universal Mobile Telecommunications System
   (UMTS), the ROHC entity is part of the Packet Data Convergence
   Protocol (PDCP) sub-layer on a Base Station; if ROHC is employed over
   Wireless Ethernet (IEEE 802.11), it can be part of the data link
   layer on a wireless router; in Mobile Ad Hoc networks, the ROHC
   entity can reside on a æforwarding nodeÆ.

   Due to the nature of the protocol suite under study, we expect
   client-server computing to dominate over peer-to-peer, as is the case
   currently. As such, 'incoming' and 'outgoing' flows are inherently
   asymmetrical. As noted in [ROHC-TCP], some asymmetry is already
   present in TCP/IP intra-flow field behaviour. An example is the
   relationship between TCP Sequence Number and Acknowledgement Number,
   for which 'outgoing' flows are likely to exhibit large deltas between
   consecutive packets in Acknowledgement Number and small deltas in
   Sequence Number, but the converse is likely for 'incoming' flows.
   With respect to context replication, [ROHC-TCP] also acknowledges
   some inter-flow asymmetry in the TCP source/destination port.






Cho & Hazra                                                     [Page 5]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   As will be shown in Section 5, asymmetry becomes even more pronounced
   between flows. The above figure partly serves to illustrate that
   asymmetrical header compression, if desired, can be achieved by
   configuring compressor-decompressor pairs differently based on their
   'incoming' or 'outgoing' role.

   Finally, it should be noted that the focus on ROHC over the final hop
   in Fig. 1 does not reduce the scope of applicability in the obtained
   results on inter-flow behaviour. In general, header compression may
   be deployed over any hop, e.g. over a core network links in Multiple
   Protocol Label Switching (MPLS), or over intermediate hops in Mobile
   Ad Hoc networks. Regardless of the location of ROHC deployment, the
   TCP/IP endpoints remain the same. The advantage of focusing on the
   last hop, then, is that it allows any asymmetrical behaviour to be
   distilled. Bi-directional asymmetry over intermediate hops causes
   inherent asymmetrical behaviour to be lost. However, over
   intermediate hops, inter-flow results continue to be applicable using
   the symmetric treatment as prescribed in Section 6.


4.  Methodology

   Given the bizarre range of inter-flow field behaviour, a suitable
   methodology for obtaining inter-flow field behaviour relevant for
   context replication is proposed.

   Inter-flow field behaviour can be obtained by emulating a context-
   replication enabled compressor. To observe any asymmetrical
   behaviour, Tcpdump traces are fed into the æcompressor emulatorÆ
   separately, according to the direction they flow, i.e. æincomingÆ or
   æoutgoingÆ. Thus, the emulator simulates the compressors found on
   client terminals and routers in the æoutgoingÆ and æincomingÆ
   directions respectively. In the same way as a compressor, the
   emulator creates, maintains and updates a list of contexts
   dynamically for each arriving packet.

   The emulator keeps an extensible list of contexts, one for each
   unique TCP connection, arranged in a Most Recently Used (MRU) stack.
   Each TCP/IP packet updates its context unique for that flow. A
   context retrieved for updating or referencing is placed at the top
   of stack, followed by its base context, if a base context has just
   been simultaneously used as reference. Whenever possible, each new
   flow is context replicated. Context replication is possible when a
   base context exists, with the implementation-dependent selection
   criteria requiring the IP source to be shared, and with preference
   but no necessity for the same IP destination. For simplicity, all
   contexts are assumed to be acknowledged by default. Furthermore, if
   the first packet of a flow can be context replicated, then it is
   assumed that the subsequent two packets of the flow would also be




Cho & Hazra                                                     [Page 6]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   replicated. This means that up to the first 3 packets of each flow
   are converted into IR-CR packets. This number is the upper bound of
   the State Transition Threshold range, and is based on the estimate of
   the upper bound of TCP/IP packets possibly converted to IR-CR
   packets. Elaboration on this would be done in Appendix A.

   Even though we show results at the upper bound of the State
   Transition Threshold, it was also found that the inter-flow field
   behaviour remains invariant at smaller State Transition Threshold
   values.

   For the purpose of this study, four Tcpdump traces totaling 1.9
   million packets were captured from within the Local Area Network of
   the Institute for Infocomm Research. The LAN configuration is shown
   in Fig. 2. Macro statistics of each trace are shown in the Table 1.


   +--------+
   | Client |
   |Terminal|<-
   +--------+  -
                -  +--------+
                 ->|Last-Hop|
                 ->| Router |<-
                -  +--------+  -
   +--------+  -                -  +--------+
   | Client |<-                  ->|  NAT   |
   |Terminal|                    ->| Router |<-
   +--------+                   -  +--------+  -
                   +--------+  -                -  +--------+
                   |Last-Hop|<-                  ->| Border |<->Internet
                 ->| Router |                    ->|Gateway |
                -  +--------+      +--------+   -  +--------+
               -                   |  NAT   |<--
         ... <-                  ->| Router |
                                -  +--------+
                               -
                        ...  <-

   Fig. 2: Configuration of Local Area Network

   Three out of four traces were captured at the Border Gateway, so that
   traffic from a large number of client terminals can be gathered in
   each single trace. However, as in most LANs, Network Address
   Translation (NAT) is in use. NAT transparently changes æoutgoingÆ
   Source IP Address and Port, as well as æincomingÆ Destination IP
   Address and Port. Thus, packets captured at the Border Gateway
   reflect the changed values rather than original values. To deal with
   this, the forth trace TCP180903 captured at a client terminal was
   used to investigate these fields as well as to verify results from
   traces captured at the Border Gateway.


Cho & Hazra                                                     [Page 7]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   +---------------+-----------+------------+------------+-----------+
   |    Trace      | TCP180803 | TCP080903a | TCP080903b | TCP180903 |
   |Identification |           |            |            |           |
   +---------------+-----------+------------+------------+-----------+
   |  Duration     |  30 min   |  30 min    |   30 min   |  27.4 hrs |
   +---------------+-----------+------------+------------+-----------+
   |   Location    |  Gateway  |  Gateway   |  Gateway   |  Client   |
   |               |  Router   |  Router    |  Router    | Terminal  |
   +---------------+-----------+------------+------------+-----------+
   |No. of packets |  516172   |   509281   |   507293   |   383594  |
   +---------------+-----------+------------+------------+-----------+
   |  Replication  |   97.5    |    94.4    |    94.3    |    93.4   |
   | Match Rate(%) |           |            |            |           |
   +---------------+-----------+------------+------------+-----------+

   Table 1: Macro statistics of TCPdump traces

   By using packets captured from our LAN, it is assumed that TCP/IP
   inter-flow field behaviour does not vary significantly between the
   wired Ethernet-based channel and the target low bandwidth, possibly
   less reliable channel where header compression takes place. Provided
   the header compression layer is sufficiently robust to be
   transparent, this is reasonable because the upper (network,
   transport and application) layer protocol characteristics and human
   usage behaviour remains the same.

   It is desired that the inter-flow behaviour of TCP/IP fields are
   mapped using a system of classification such that fields within a
   category share the same characteristic. [TCP-BEH] already provides a
   good system of classification for intra-flow field behaviour:
   INFERRED, STATIC, STATIC-DEF, STATIC-KNOWN, CHANGING, where each
   category follows some general trend(s) hinting how fields in that
   category may be compressed. For inter-flow behaviour, [TCP-BEH] uses
   a different system of classification: 'N/A/', 'No', 'Yes', which
   unfortunately does not achieve the same level of effectiveness,
   because one can only discern whether a field is compressible for
   context replication, but does not know how to suitably compress it.
   Therefore, in this document, the inter-flow field behaviour is
   classified based on the same categories as used for intra-flow
   behaviour: INFERRED, STATIC, STATIC-KNOWN, CHANGING. However, it
   should be noted that the context here lies in inter-flow field
   behaviour. Furthermore, here STATIC-DEF is merged into STATIC because
   it is meaningless to define a STATIC category for fields defining a
   packet stream where inter-flow field behaviour is concerned.

   Classification can be done with the help of observing the range of
   deltas. Here, delta is defined as the difference in field value
   between that in the current packet and the stored field value in the
   base context. The delta analysis is useful for the following reasons.
   For any field not known to be INFERRED or STATIC-KNOWN, if delta = 0
   in all samples, then this field is a STATIC field. If not, the field


Cho & Hazra                                                     [Page 8]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   is categorized as CHANGING. For CHANGING fields, by further analyzing
   the range of deltas obtained, it can be found whether the field can
   still be encoded using the STATIC encoding method with significant
   probability. Since deltas tend to be small, the number of least
   significant bits used (in LSB encoding) to encode that field with a
   significant probability of success can be determined. Fields which
   tend to have uniformly distributed deltas may only be suitably
   encoded as IRREGULAR. Finally, where certain unique trends are
   observed for a field, raw and/or network-byte-order converted
   versions of field values are also studied.


5.  Results

   Our initial categorization is shown in Table 2. Differences between
   intra-flow classification (in [TCP-BEH]) and inter-flow
   classification here are marked with '(2)'. At this stage, there is no
   asymmetry observed in categorization between æincoming and æoutgoingÆ
   flows.

          +-----------------------------------+------------+
          | Field                             |Category    |
          +-----------------------------------+------------+
          |IPv4 Version                       |STATIC      |
          |IPv4 Header Length                 |STATIC-KNOWN|
          |IPv4 Type Of Service               |STATIC(1)   |
          |IPv4 ECN Capable Transport         |STATIC(1)   |
          |IPv4 Congestion Experienced        |STATIC(1)   |
          |IPv4 Packet Length                 |INFERRED    |
          |IPv4 Identification                |CHANGING    |
          |IPv4 Reserved Flag                 |STATIC(1)   |
          |IPv4 DonÆt Fragment Flag           |CHANGING    |
          |IPv4 More Fragments Flag           |STATIC-KNOWN|
          |IPv4 Fragment Offset               |STATIC-KNOWN|
          |IPv4 Time To Live                  |CHANGING    |
          |IPv4 Protocol                      |STATIC      |
          |IPv4 Header Checksum               |INFERRED    |
          |IPv4 Source Address                |STATIC      |
          |IPv4 Destination Address           |CHANGING(2) |
          |TCP Source Port                    |CHANGING(2) |
          |TCP Destination Port               |CHANGING(2) |
          |TCP Sequence Number                |CHANGING    |
          |TCP Acknowledgement Number         |CHANGING    |
          |TCP Data Offset                    |INFERRED    |
          |TCP Reserved                       |STATIC(1)   |
          |TCP Congestion Window Reduced      |STATIC(1)   |
          |TCP Echo Congestion Experienced    |STATIC(1)   |
          |TCP URG flag                       |CHANGING    |
          |TCP ACK flag                       |CHANGING    |
          |TCP PSH flag                       |CHANGING    |
          |TCP RST flag                       |CHANGING    |


Cho & Hazra                                                     [Page 9]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


          |TCP SYN flag                       |CHANGING    |
          |TCP FIN flag                       |CHANGING    |
          |TCP Window                         |CHANGING    |
          |TCP Checksum                       |CHANGING    |
          |TCP Urgent Pointer                 |CHANGING    |
          |TCP Options                        |CHANGING    |
          +-----------------------------------+------------+
   (1)These fields were found to be STATIC from samples, but context
      replication should follow the classification in [TCP-BEH] for
      future-proofing.
   (2)Differs from intra-flow classification [TCP-BEH] due to context
      replication.
           Table 2: TCP/IP Fields and Classifications


   Some changes in categorization are made in this study because of the
   current slow adoption of IP and TCP congestion notification fields.
   However, these fields are expected to be used in the future and
   should be CHANGING instead of STATIC.

   The encoding methods to be used for STATIC, STATIC-KNOWN and INFERRED
   fields are straightforward, but CHANGING fields need to be further
   analyzed. This will be unraveled in subsequent sub-sections. CHANGING
   fields can sometimes be encoded with STATIC, LSB, or other encoding
   methods with significant probability. For LSB encoding, it is desired
   to determine the suitable number of least significant bits to be used
   to encode that field. Therefore, our frequency bins are defined in
   increasing ceil(log2(|delta|+1)) (the reason for this expression
   will be elaborated later in this section), which is effectively the
   minimum number of bits possibly used to encode delta values within
   that bin. Negative delta values are mapped to ûceil(log2(|delta|+1)),
   and are useful for defining the offset value used in LSB encoding.
   From our frequency tables, we can also derive the correct combination
   of encoding methods to use, as well as the estimated probability of
   each encoding method being used.

   The inter-flow behaviour of CHANGING fields can be summarized
   directly in the form of packet format specifications for IR-CR
   packets. This is shown in Fig. 3, in EPIC-LITE terminology [EPIC-
   LITE], which is derived from the BNF input language [RFC-2234]. To
   illustrate asymmetrical inter-flow behaviour, packet format
   specifications with any differences between 'incoming' and 'outgoing'
   flows are defined separately for each field with the postfix ô_inö or
   ô_outö. Note however that if the same set of encoding methods are
   used in both directions for the same field, and only the
   probabilities are different, then it may mean that significant
   asymmetrical behaviour has not been observed.






Cho & Hazra                                                    [Page 10]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   Identification_in ::= NBO(16) ;network byte order
                         LSB(3,-1,50%) | LSB(8,-1,17%) | IRREGULAR(33%)
   Identification_out ::= NBO(16)
                          LSB(3,-1,65%) | LSB(8,-1,14%) | IRREGULAR(21%)

   DonÆt_Fragment_in ::= STATIC(73%) | IRREGULAR(1,27%)
   DonÆt_Fragment_out ::= STATIC(99%) | IRREGULAR(1,1%)

   Time_To_Live_in ::= STATIC(98%) | IRREGULAR(8,2%)
   Time_To_Live_out ::= STATIC(97%) | IRREGULAR(8,3%)

   Destination_Address_in ::= STATIC(100%)
   Destination_Address_out ::= STATIC(86%) | IRREGULAR(32,14%)

   Source_Port_in  ::= STATIC(70%) | IRREGULAR(16,30%)
   Source_Port_out ::= LSB(3,0,73%) | LSB(8,0,14%) | IRREGULAR(16,13%)

   Destination_Port_in ::= LSB(3,0,73%) | LSB(8,0,14%) |
                           IRREGULAR(16,13%)
   Destination_Port_out  ::= STATIC(70%)| IRREGULAR(16,30%)

   Sequence_Number ::= IRREGULAR(32,100%)

   Acknowledgement_Number_in ::= IRREGULAR(32,100%)
   Acknowledgement_Number_out ::= VALUE(32,0,33%) | IRREGULAR(32,67%)

   URG_flag ::= IRREGULAR(1,100%)

   ACK_flag ::= IRREGULAR(1,100%)

   PSH_flag ::= IRREGULAR(1,100%)

   RST_SYN_FIN_flag ::= VALUE(3,2,30%) | VALUE(3,0,65%) |
                        IRREGULAR(3,5%)

   Urgent_Pointer ::= STATIC(99%) | IRREGULAR(16,1%)

   Window_in ::= STATIC(30%)| IRREGULAR(16,70%)
   Window_out ::= STATIC(43%) | IRREGULAR(16,57%)

   Fig. 3.  Packet format specifications for CHANGING fields.


   In Fig. 3, specifications are expressed in the notation used by EPIC-
   LITE instead of the Formal Notation [ROHC-FN] due to a number of
   reasons. Firstly, basic encoding methods used in both remain the
   same, and so EPIC-LITE expressions can be easily converted into
   Formal Notation. Moreover, the equivalent of the 'multiple_packet_
   formats' encoding method in ROHC-FN, used to specify multiple
   encoding methods for a field, can be represented in a more compact



Cho & Hazra                                                    [Page 11]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   form using the OR operator, '|' in EPIC-LITE. Also, because EPIC-LITE
   involves Huffman coding, it allows the expression of the probability
   of each encoding method being successful as a parameter, which is
   also useful for expressing the frequency of use of an encoding
   method. Finally, it allows the packet format specifications to be
   readily verified via context replication implementation in EPIC-LITE.

   Details of the inter-flow behaviour of each CHANGING field are
   elaborated in the following sub-sections.


5.1.  IPv4 Identification

   Table 3 shows the distribution of delta values in logarithmic scale.
   Note that for delta > 0, the number of bits used to encode the delta
   may be expressed as n = ceil(log2(|delta|+1)), as we are trying to
   find the smallest n for which delta <= 2^n - 1. For delta < 0, the
   equivalent mapping is n = -ceil(log2(|delta|+1)).


      +--------+---------------+-----------+-----------+
      |Encoded |  Delta Range  | Incoming  | Outgoing  |
      |Bits,n  |               | Frequency | Frequency |
      +--------+---------------+-----------+-----------+
      |-16     |[-65535:-32768]|   6.0%    |   2.3%    |
      |-15     |[-32767:-16384]|   4.5%    |   2.1%    |
      |-14     |[-16383:-8192] |   2.4%    |   2.1%    |
      |-13     |[-8191:-4096]  |   1.5%    |   0.8%    |
      |-12     |[-4095:-2048]  |   0.7%    |   0.6%    |
      |-11     |[-2047:-1024]  |   0.3%    |   0.3%    |
      |-10     |[-1023:-512]   |   0.2%    |   0.1%    |
      |-9      |[-511:-256]    |   0.1%    |   0.1%    |
      |-8      |[-255:-128]    |   0.1%    |   0.1%    |
      |-7      |[-127:-64]     |   0.0%    |   0.0%    |
      |-6      |[-63:-32]      |   0.0%    |   0.0%    |
      |-5      |[-31:-16]      |   0.0%    |   0.0%    |
      |-4      |[-15:-8]       |   0.0%    |   0.0%    |
      |-3      |[-7:-4]        |   0.1%    |   0.0%    |
      |-2      |[-3:-2]        |   0.2%    |   0.2%    |
      |-1      |[-1]           |   0.6%    |   0.4%    |
      |0       |[0]            |   0.3%    |   0.0%    |
      |1       |[1]            |   23.4%   |   33.7%   |
      |2       |[2:3]          |   20.6%   |   20.8%   |
      |3       |[4:7]          |   6.6%    |   10.5%   |
      |4       |[8:15]         |   3.9%    |   4.3%    |
      |5       |[16:31]        |   3.6%    |   3.3%    |
      |6       |[32:63]        |   3.6%    |   2.4%    |
      |7       |[64:127]       |   3.4%    |   2.0%    |
      |8       |[128:255]      |   2.3%    |   1.6%    |
      |9       |[256:511]      |   2.3%    |   1.2%    |



Cho & Hazra                                                    [Page 12]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      |10      |[512:1023]     |   1.7%    |   1.2%    |
      |11      |[1024:2047]    |   1.4%    |   1.1%    |
      |12      |[2048:4095]    |   0.9%    |   1.0%    |
      |13      |[4096:8191]    |   1.3%    |   1.1%    |
      |14      |[8192:16383]   |   2.5%    |   2.3%    |
      |15      |[16384:3276]   |   3.0%    |   2.4%    |
      |16      |[32768:65535]  |   2.4%    |   1.9%    |
      +--------+---------------+-----------+-----------+

   Table 3: Frequency distribution of Identification delta

   Slightly asymmetrical behaviour can be observed from Table 3.
   æIncomingÆ replicated packets are less likely to be encoded within 3
   bits compared to æoutgoingÆ replicated packets. Moreover, æincomingÆ
   delta values are more distributed, with higher occurrence of negative
   deltas as well as deltas encodable between 6 to 10 bits. This is
   reasonable because æincomingÆ replicated packets face larger deltas
   due to busy servers handling multiple connections simultaneously or
   near-simultaneously.

   Inter-flow Identification deltas for æoutgoingÆ replicated packets
   tend to be smaller than for æincomingÆ, as clients do not usually
   maintain a large number of simultaneous or near-simultaneous TCP
   connections.

   It should be noted that Table 3 depicts network-byte-order corrected
   Identification deltas. Typical implementation policies of IPv4
   Identification increment are: sequential (increments by 1),
   sequential-jump (typically increments by 256) and random. Linux based
   implementations usually implements the sequential policy, and older
   versions of Microsoft Windows usually implements the sequential-jump
   policy with a jump size of 256. This is the equivalent of
   incrementing the more significant byte of the two-byte Identification
   field by 1. From a compression viewpoint, sequential-jump
   implementations can be network-byte-order corrected at the compressor
   end and reverted back to the original form at the decompressor end.
   This approach has the advantage of compressing Identification fields
   generated from both policies efficiently using the same encoding
   method. A network byte order (NBO) flag is communicated to
   differentiate between the two policies. Randomly incremented
   Identification implementations cannot be efficiently compressed and
   are sent as-is.

   Current proposals for context replication compresses the
   IPv4 Identification field into 0 or 16 bits, using VALUE and
   IRREGULAR encoding methods respectively. The VALUE encoding method is
   suitable for protocols like DHCP, and is not seen in Fig. 3 because
   we are focusing on TCP/IP. However, it can be seen from the above
   inter-flow behaviour that this field can also be compressed more
   efficiently using LSB encoding, with recommended parameters as shown
   in Fig. 3.


Cho & Hazra                                                    [Page 13]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


5.2. IP DonÆt Fragment Flag and Time to Live

   The DF Flag is a single bit which may be set or unset. Although it
   may be impractical to allow multiple encoding methods for a single
   bit field, for the sake of characterizing its behaviour, STATIC and
   IRREGULAR encoding methods are used. The IPv4 TTL (or equivalently,
   IPv6 Hop Limit) is a 8-bit field which remains constant when the
   route between the two endpoints is unchanged; when the route does
   change due to congestion, it is better to simply send the field
   uncompressed. Therefore, DF can be further analyzed in the same
   category as TTL: we either encode them as STATIC, or uncompressed as
   IRREGULAR. The actual probabilities associated with each encoding
   method based on the samples is shown in Table 4.


      +----------------+--------+-----------+
      |Encoding Method | STATIC | IRREGULAR |
      +----------------+--------+-----------+
      |          æIncomingÆ flows           |
      +-------------------------------------+
      |DonÆt Fragment  |  72.8% |   27.2%   |
      |Time To Live    |  98.1% |    1.9%   |
      +----------------+--------+-----------+
      |          æOutgoingÆ flows           |
      +-------------------------------------+
      |DonÆt Fragment  |  98.5% |    1.5%   |
      |Time To Live    |  96.9% |    3.1%   |
      +-------------------------------------+

   Table 4: Percentage frequency of STATIC and IRREGULAR for DF and TTL


5.3.  IP Destination Address

   We have allowed for an implementation to use context replication
   for scenarios where packets share at least the same Source IP
   Address, but the Destination IP Address may be different. Therefore,
   the Destination IP Address may be STATIC or IRREGULAR for these two
   scenarios.

   The proportion of IR-CR packets replicable due to the same/different
   Destination IP Address is of interest. This determines how effective
   the use of context replication to cover different IP Destination
   Addresses can be. This proportion is tabulated in Table 5.









Cho & Hazra                                                    [Page 14]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      +------------+----------+-----------+
      |            |  STATIC  | IRREGULAR |
      +------------+----------+-----------+
      | 'Incoming' |  100.0%  |    0.0%   |
      | 'Outgoing' |   85.8%  |   14.2%   |
      +------------+----------+-----------+

   Table 5: Percentage frequency of STATIC and IRREGULAR for IP
   Destination Address.

   As can be noted from Table 5, the results are skewed towards STATIC
   (same Destination IP Address). This is because our emulator selects
   the base context with preference for sharing the same Source and
   Destination IP Address, although it is much easier to find contexts
   sharing only the same Source IP Address. For some intervals, the
   proportion of 'outgoing' IRREGULAR cases got as high as 48%.

   Asymmetry is again observed to be inherent between æincomingÆ and
   æoutgoingÆ flows. æIncomingÆ flows originating from Internet servers
   are not likely to engage multiple common subnet clients within a
   short period of time. However, the converse is true for æoutgoingÆ
   flows, corresponding to prevalent usage patterns.

   Our results also justify the virtue of an implementation which
   considers context replication for cases even when the Destination IP
   Address is different. This maximizes context replication efficiency
   gains for æoutgoingÆ flows.


5.4.  TCP Source Port

   As can be seen from Table 6, clearly asymmetrical inter-flow
   behaviour is observed for the TCP Source Port field. This behaviour
   is seen mainly because ports at servers are well-known ports which
   remain unchanged.

      +---------------------------------------------+
      |Encoded |  Delta Range  | Incoming |Outgoing |
      |Bits,n  |               | Frequency|Frequency|
      +--------+---------------+----------+---------+
      |-16     |[-65535:-32768]|   0.0%   |   0.0%  |
      |-15     |[-32767:-16384]|   0.0%   |   0.0%  |
      |-14     |[-16383:-8192] |   0.0%   |   0.0%  |
      |-13     |[-8191:-4096]  |   0.0%   |   0.3%  |
      |-12     |[-4095:-2048]  |   5.8%   |   0.2%  |
      |-11     |[-2047:-1024]  |   1.8%   |   0.6%  |
      |-10     |[-1023:-512]   |   0.1%   |   1.7%  |
      |-9      |[-511:-256]    |   1.0%   |   0.0%  |
      |-8      |[-255:-128]    |   0.5%   |   0.0%  |




Cho & Hazra                                                    [Page 15]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      |-7      |[-127:-64]     |   0.7%   |   0.0%  |
      |-6      |[-63:-32]      |   0.7%   |   0.0%  |
      |-5      |[-31:-16]      |   0.0%   |   0.0%  |
      |-4      |[-15:-8]       |   0.0%   |   0.0%  |
      |-3      |[-7:-4]        |   0.3%   |   0.1%  |
      |-2      |[-3:-2]        |   0.0%   |   0.1%  |
      |-1      |[-1]           |   0.0%   |   2.3%  |
      |0       |[0]            |  72.0%   |  15.8%  |
      |1       |[1]            |   0.0%   |  31.9%  |
      |2       |[2:3]          |   0.0%   |  17.4%  |
      |3       |[4:7]          |   0.0%   |   7.8%  |
      |4       |[8:15]         |   0.1%   |   4.7%  |
      |5       |[16:31]        |   0.1%   |   3.3%  |
      |6       |[32:63]        |   0.3%   |   2.0%  |
      |7       |[64:127]       |   0.3%   |   3.0%  |
      |8       |[128:255]      |   0.7%   |   1.1%  |
      |9       |[256:511]      |   0.8%   |   2.7%  |
      |10      |[512:1023]     |   3.0%   |   3.2%  |
      |11      |[1024:2047]    |  10.5%   |   1.5%  |
      |12      |[2048:4095]    |   1.2%   |   0.1%  |
      |13      |[4096:8191]    |   0.0%   |   0.3%  |
      |14      |[8192:16383]   |   0.0%   |   0.0%  |
      |15      |[16384:3276]   |   0.0%   |   0.0%  |
      |16      |[32768:65535]  |   0.1%   |   0.0%  |
      +--------+---------------+----------+---------+

   Table 6: Frequency distribution of Source Port delta


5.5.  TCP Destination Port

   The inter-flow behaviour of the TCP Destination Port field is shown
   in Table 7. It can be observed that the trend is the opposite to
   that of the TCP Source Port presented previously. This can be
   accounted for obviously because the Destination Ports of æoutgoingÆ
   packets are the Source Ports of replying æincomingÆ packets.

      +--------+---------------+-----------+-----------+
      |Encoded |  Delta Range  | Incoming  | Outgoing  |
      |Bits,n  |               | Frequency | Frequency |
      +--------+---------------+-----------+-----------+
      |-16     |[-65535:-32768]|   0.0%    |   0.0%    |
      |-15     |[-32767:-16384]|   0.0%    |   0.0%    |
      |-14     |[-16383:-8192] |   0.0%    |   0.0%    |
      |-13     |[-8191:-4096]  |   0.3%    |   0.0%    |
      |-12     |[-4095:-2048]  |   0.0%    |   0.4%    |
      |-11     |[-2047:-1024]  |   0.0%    |   4.1%    |
      |-10     |[-1023:-512]   |   0.0%    |   2.0%    |
      |-9      |[-511:-256]    |   0.0%    |   0.1%    |
      |-8      |[-255:-128]    |   0.0%    |   0.9%    |
      |-7      |[-127:-64]     |   0.0%    |   0.4%    |


Cho & Hazra                                                    [Page 16]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      |-6      |[-63:-32]      |   0.0%    |   0.5%    |
      |-5      |[-31:-16]      |   0.0%    |   1.9%    |
      |-4      |[-15:-8]       |   0.0%    |   0.0%    |
      |-3      |[-7:-4]        |   0.3%    |   0.0%    |
      |-2      |[-3:-2]        |   0.2%    |   0.2%    |
      |-1      |[-1]           |   6.8%    |   0.0%    |
      |0       |[0]            |  23.3%    |  74.3%    |
      |1       |[1]            |  33.4%    |   0.0%    |
      |2       |[2:3]          |   8.4%    |   0.1%    |
      |3       |[4:7]          |   6.9%    |   0.0%    |
      |4       |[8:15]         |   3.8%    |   0.1%    |
      |5       |[16:31]        |   2.8%    |   0.1%    |
      |6       |[32:63]        |   2.3%    |   0.8%    |
      |7       |[64:127]       |   3.4%    |   0.2%    |
      |8       |[128:255]      |   1.2%    |   0.4%    |
      |9       |[256:511]      |   2.7%    |   0.8%    |
      |10      |[512:1023]     |   2.4%    |   2.1%    |
      |11      |[1024:2047]    |   1.4%    |   8.2%    |
      |12      |[2048:4095]    |   0.0%    |   1.8%    |
      |13      |[4096:8191]    |   0.4%    |   0.4%    |
      |14      |[8192:16383]   |   0.0%    |   0.1%    |
      |15      |[16384:3276]   |   0.0%    |   0.0%    |
      |16      |[32768:65535]  |   0.0%    |   0.0%    |
      +--------+---------------+-----------+-----------+

   Table 7: Frequency distribution of Destination Port delta


5.6.  TCP Sequence Number and Acknowledgement Number

   The TCP Sequence Number (SEQNUM) cannot be replicated as the inter-
   flow delta is random with a uniform probability density function,
   regardless of the direction of flow. The TCP Acknowledgement Number
   (ACKNUM) generally follows the randomness of SEQNUM, but a particular
   behaviour can be exploited for compression of the first packet of
   most æoutgoingÆ flows. All handshaking packets with SYN set but ACK
   clear (the first packet of TCP connections) carry ACKNUM with zero
   value. This is a behaviour unique to æoutgoingÆ flows because
   service-requesting clients typically initiate the first packet within
   TCP connections. The first æincomingÆ packet typically carries both
   SYN and ACK set, and ACKNUM would be non-zero. Because up to the
   third packet of each flow may be replicated, this represents at least
   30% to 100% of all æoutgoingÆ replicated packets. Thus, ACKNUM can at
   worst be compressed as shown in Fig. 3.

   Alternatively, instead of basing the specifications on asymmetry, all
   compressor-decompressor pairs can treat the SYN-set ACK-not-set case
   as a flag to infer that the value of ACKNUM is 0. These fields are
   already appropriately handled as prescribed in [ROHC-TCP].




Cho & Hazra                                                    [Page 17]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


5.7.  TCP Flags and Urgent Pointer

   ôTCP Flagsö refers to the TCP group of six flags: URG (Urgent), ACK
   (Acknowledgement), PSH (Push), RST (Reset), SYN (Synchronize) and FIN
   (Finish).

   The URG flag was not found to be set in almost our entire sample,
   i.e. it is much more likely to be 0 than 1. In some applications,
   however, the URG flag may be used extensively. Thus, it can be
   encoded as IRREGULAR(1,100%). The URG flag is also useful for
   indicating the presence of the Urgent Pointer field. The compressor-
   decompressor pair can treat this field as IRREGULAR when URG is set
   and zero when URG is not set.

   ACK is not set only in the first handshaking packet of all
   connections (similar to ACKNUM), as well as in some minority packets
   with RST set. Since the proportion of IR-CR packets carrying an unset
   ACK can range from 33% to 100%, it should be sent as
   IRREGULAR(1,100%).

   PSH was found to be varying unpredictably between 0 and 1, and is
   thus best left as IRREGULAR(1,100%).

   There is high correlation between RST, SYN and FIN behaviour,
   allowing them to be encoded together. RST and FIN are not set in
   almost 100% of replicated packets. These three flags can
   therefore encoded as: VALUE(3,2,30%) | VALUE(3,0,65%) |
   IRREGULAR(3,5%). Equivalently, these three flags can also be
   encoded as prescribed in [ROHC-TCP] using the ôindexö encoding
   method, with FIN or RST exclusively set as the two other common
   values.


5.8.  TCP Window

   Table 8 shows the delta distribution.  For flows in both directions,
   the main peak is at delta = 0, with amplitude 43% for æoutgoingÆ
   replicated packets and 30% for æincomingÆ packets. We can encode
   these cases with STATIC encoding.

      +--------+---------------+-----------+-----------+
      |Encoded |  Delta Range  | Incoming  | Outgoing  |
      |Bits,n  |               | Frequency | Frequency |
      +--------+---------------+-----------+-----------+
      |-16     |[-65535:-32768]|   0.0%    |   0.0%    |
      |-15     |[-32767:-16384]|   3.4%    |   2.8%    |
      |-14     |[-16383:-8192] |   0.2%    |   0.4%    |
      |-13     |[-8191:-4096]  |  14.0%    |   2.1%    |
      |-12     |[-4095:-2048]  |  20.7%    |   0.9%    |
      |-11     |[-2047:-1024]  |   1.3%    |   0.1%    |
      |-10     |[-1023:-512]   |   6.6%    |   1.7%    |


Cho & Hazra                                                    [Page 18]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      |-9      |[-511:-256]    |   4.4%    |   2.3%    |
      |-8      |[-255:-128]    |   4.1%    |   0.8%    |
      |-7      |[-127:-64]     |   0.6%    |   2.6%    |
      |-6      |[-63:-32]      |   0.4%    |   1.2%    |
      |-5      |[-31:-16]      |   0.2%    |   0.7%    |
      |-4      |[-15:-8]       |   0.1%    |   0.5%    |
      |-3      |[-7:-4]        |   0.1%    |   0.1%    |
      |-2      |[-3:-2]        |   0.2%    |   0.0%    |
      |-1      |[-1]           |   0.2%    |   0.0%    |
      |0       |[0]            |  30.4%    |  43.2%    |
      |1       |[1]            |   0.1%    |   0.0%    |
      |2       |[2:3]          |   0.1%    |   0.1%    |
      |3       |[4:7]          |   0.1%    |   0.1%    |
      |4       |[8:15]         |   0.1%    |   0.2%    |
      |5       |[16:31]        |   0.2%    |   0.2%    |
      |6       |[32:63]        |   0.1%    |   0.8%    |
      |7       |[64:127]       |   0.4%    |   1.7%    |
      |8       |[128:255]      |   0.2%    |   3.4%    |
      |9       |[256:511]      |   1.1%    |   4.0%    |
      |10      |[512:1023]     |   1.1%    |   6.8%    |
      |11      |[1024:2047]    |   2.0%    |   3.0%    |
      |12      |[2048:4095]    |   0.5%    |   0.1%    |
      |13      |[4096:8191]    |   2.3%    |   0.3%    |
      |14      |[8192:16383]   |   2.5%    |   3.2%    |
      |15      |[16384:3276]   |   0.1%    |   3.5%    |
      |16      |[32768:65535]  |   2.2%    |  13.1%    |
      +--------+---------------+-----------+-----------+

   Table 8: Frequency distribution of Window delta


   Unlike other fields, Window delta values tend not to cluster
   near the main peak. This is an expected behaviour. Naturally, LSB
   would not be a suitable encoding method for the Window field. A
   number of secondary peaks can be observed in Table 8, which suggests
   that Windows tend to vary among a few discontinuous but commonly
   used values.

   We determine the most common Window values for æincomingÆ and
   æoutgoingÆ flows separately and obtain a distribution of these
   common Window values. This is shown in Table 9. It can
   be observed again that asymmetry is inherent between æincomingÆ and
   æoutgoingÆ flows. In this case, asymmetry is due to the use of a
   different range of popular Window values between æincomingÆ and
   æoutgoingÆ flows. æIncomingÆ advertised Window fields typically come
   from HTTP servers sending data more than receiving data. Servers
   typically advertise their receiver window conservatively and are slow
   to grow their windows, to prevent data overloads from handling
   multiple clients concurrently, and because of the congestion window




Cho & Hazra                                                    [Page 19]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   slow start algorithm [RFC-2581]. On the other
   hand, sources of æoutgoingÆ traffic are normally clients downloading
   data from servers. To utilize bandwidth efficiently, the advertised
   window is usually large, usually right from the first packet. This is
   consistent with recent proposals for increasing the TCP initial
   Window size [RFC-3390].


      +----------------------+----------------------+
      |       Incoming       |       Outgoing       |
      +--------+-------------+--------+-------------+
      | Value  | Probability | Value  | Probability |
      |        |     (%)     |        |     (%)     |
      +--------+-------------+--------+-------------+
      |  1380  |     1.1     |  1460  |     1.6     |
      |  1460  |    23.5     |  2920  |     1.6     |
      |  2760  |     1.3     |  8192  |     3.1     |
      |  2920  |    22.2     |  8280  |     6.6     |
      |  5840  |     2.2     | 16384  |    10.3     |
      |  8280  |    11.7     | 16560  |     8.0     |
      | 11680  |     4.9     | 64240  |    26.3     |
      | 16384  |     6.9     | 64860  |     8.8     |
      | 16560  |     2.1     | 65520  |     2.6     |
      | 65535  |     4.6     | 65535  |    18.3     |
      +--------+-------------+--------+-------------+
      | Total  |    80.4     |   -    |    87.2     |
      +--------+-------------+--------+-------------+

   Table 9: Common Window field values

   The common values of the Window field, inclusive of all category
   values found in Table 9, can be typically expressed as either (i) a
   multiple of the Maximum Segment Size of the end-to-end channel, or
   (ii) a raised power of 2, with possibly an offset of 1.

   The Maximum Segment Size (MSS) is negotiated between both TCP
   endpoints, through the TCP Options in TCP handshaking packets. The
   negotiated MSS and is in turn derived from the IP Maximum Transfer
   Unit (MTU) of the underlying network [RFC-1122]. The MTU over
   Ethernet is 1500 bytes, or 1492 if used with Sub-network Attachment
   Point (SNAP), or 1300 if used with PPP over Ethernet (for ADSL
   links). Subtracting 40 bytes for TCP/IPv4 protocol stack, or 60 bytes
   for the TCP/IPv6 protocol stack, or 120 bytes for maximum TCP/IP
   header size, typically advertised MSS values are 1460, 1380, 1260,
   1440 or 1452 bytes, in decreasing popularity. From the above set of
   MSS values, 1460 and 1380 are used almost exclusively. Consequently,
   almost all the Window values found in Table 9 can be expressed either
   as multiples of 1460 or 1380. Exceptions are 8192, 16384, 65535,
   which are raised powers of 2 with possibly offset of 1, and 65520,
   which is a multiple of 1260.



Cho & Hazra                                                    [Page 20]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   Thus, commonly used Window values not expressible as multiples
   of the MSS values are raised powers of 2 with possibly an offset of
   1. From Table 9, 8192, 16384 and 65535 are 2^13, 2^14 and 2^16 - 1
   respectively.

   Also, the TCP Window is always 0 when RST (Reset flag) is set.
   Therefore, the decompressor can infer the Window value whenever
   RST is set and there is no need to send it.

   The TCP Window field is used in both congestion and flow
   control. The use of congestion control can account partly for the
   commonly used values discussed above, as congestion control changes
   are in multiples of the MSS. However, values due to flow control do
   not follow the pattern discussed above but are typically small
   offsets from the above commonly used values.

   Currently, the Window field is either encoded as STATIC or IRREGULAR
   for context replication [ROHC-TCP]. The above observations illustrate
   that current use of encoding methods do not sufficiently make use of
   the unique behaviour of the Window field. It also provides the
   motivation for devising a more efficient way of encoding the Window
   field. This encoding method is elaborated upon in [TCP-WIN].


5.9.  TCP Checksum

   The TCP Checksum field covers the pseudo-header, payload and TCP
   header, and varies between packets. Although ROHC packets may contain
   a CRC field, the CRC does not cover the payload. Since it is
   important to preserve data integrity, the Checksum field is sent
   uncompressed as IRREGULAR (16,100%).


5.10.  TCP Options

   TCP options contain a wide variety of optional fields, but commonly
   used options include the MSS, Window Scale and SACK-Permitted found
   in handshaking packets. These fields do not change between replicated
   packets and can thus be compressed efficiently as STATIC for context
   replication.


5.11.  Mean Sizes of Compressed Fields

   Table 10 shows the TCP/IP fields found in æincomingÆ IR-CR packets
   and calculates the mean sizes of their encoded forms. Compressed
   TCP/IP fields take up a mean size of 107.3 bits for æincomingÆ flows.
   By repeating the calculation based on æoutgoingÆ packet format
   specifications, it can be shown that the mean æoutgoingÆ IR-CR size
   is 97.5 bits.



Cho & Hazra                                                    [Page 21]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      +---------------------+------+--------------------------+-------+
      |                     | Size |  Encoded size (bits) &   |  Mean |
      |       Field         |      |        probability       |Encoded|
      |                     |(bits)|                          |  Size |
      |                     |      |                          | (bits)|
      +---------------------+------+--------------------------+-------+
      |IPv4 Identification  |  16  | 3(50%) | 8(17%) | 16(33%)|  8.14 |
      |IPv4 DonÆt Fragment  |   1  |      0(73%) | 1(27%)     |  0.27 |
      |IPv4 Time To Live    |   8  |      0(98%) | 8(2%)      |  0.16 |
      |IPv4 Dest. Address   |  32  |      0(98%) | 32(2%)     |  0.64 |
      |TCP Source Port      |  16  |     0(70%) | 16(30%)     |  4.80 |
      |TCP Dest. Port       |  16  | 3(73%) | 8(14%) | 16(13%)|  5.39 |
      |TCP Sequence Number  |  32  |         32(100%)         |   32  |
      |TCP Ack. Num         |  32  |         32(100%)         |   32  |
      |TCP flags            |   8  |      2(95%) | 5(5%)      |  2.15 |
      |TCP Window           |  16  | 0(30%) | 6(47%) | 4(8%)  |  5.54 |
      |                     |      |        | 16(15%)         |       |
      |TCP Checksum         |  16  |         16(100%)         |   16  |
      |TCP Urgent Pointer   |  16  |      0(99%) | 16(1%)     |  0.16 |
      +---------------------+------+--------------------------+-------+
      |TOTAL                | 209  |             -            | 107.3 |
      +---------------------+------+--------------------------+-------+

   Table 10: Mean Encoded Sizes of æincomingÆ TCP/IP Fields


6.  Handling Asymmetrical Inter-flow Behaviour

   From the previous section, and as summarized in Fig. 3, some TCP/IP
   fields exhibit inherently asymmetrical behaviour. The issue, then, is
   to explore various ways of handling such asymmetrical behaviour such
   that the gain versus complexity tradeoff can be optimized.

   As observable from the header compression model in Fig. 1 and
   asymmetrical packet format specifications in Fig. 3, asymmetrical
   inter-flow behaviour can be handled by asymmetrical header
   compression. This can be done by configuring compressor-decompressor
   using a different set of packet format specifications, based on their
   'incoming' or 'outgoing' role. While this treatment has the highest
   compression efficiency, its main disadvantage is that it may be more
   complicated than symmetrical header compression.

   Alternatively, asymmetrical behaviour can also be handled using
   symmetrical packet format specifcations, by expanding the use of the
   'multiple_packet_formats' encoding method [ROHC-FN] to cover
   asymmetrical behaviour, at the cost of using a few more
   'discriminator bits'. This is the methodology being adopted in
   current ROHC drafts.





Cho & Hazra                                                    [Page 22]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   From Fig. 3, the fields exhibiting significant asymmetrical behaviour
   are the IP Destination Address, TCP Source Port, Destination Port and
   Acknowledgement Number. (The behaviour of TCP Window is in fact
   also asymmetrical, but asymmetry cannot be expressed using current
   encoding methods) To handle these fields symmetrically, the following
   packet format specifications can be used instead:

   Destination_Address ::= STATIC(.) | IRREGULAR(32,.) %1 discriminator
                                                       % bit

   Source_Port ::= STATIC(.) | LSB(3,0,.) | LSB(8,0,.) |
                   IRREGULAR(16,.) %2 discriminator bits

   Destination_Port ::= STATIC(.) | LSB(3,0,.) | LSB(8,0,.) |
                        IRREGULAR(16,.) %2 discriminator bits

   Acknowledgement_Number ::= VALUE(32,0,.) | IRREGULAR(32,.)
                                                    %1 discriminator bit

   Fig. 4: Symmetrical packet format specifications for fields with
   asymmetrical behaviour

   The asymmetrical behaviour of Window field may be handled
   efficiently using a proposed encoding method as elaborated in [TCP-
   WIN]. This encoding method can be either symmetrical or asymmetrical.


7.  Security Considerations

   This document does not bring any new additional security
   considerations.


8.  References

   [RFC-3390]  Allman, M., Floyd, S., Partridge, C.,. ôIncreasing TCPÆs
               Initial Windowö, RFC 3390, October 2002.

   [RFC-3095]  Bormann, C., Burmeister, C., Degermark, M., Fukushima,
               H., Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T.,
               Le, K., Liu, Z., Martensson, A., Miyazaki, A., Svanbro,
               K., Wiebke, T., Yoshimura, T. and H. Zheng, "RObust
               Header Compression (ROHC): Framework and four profiles:
               RTP, UDP, ESP, and uncompressed", RFC 3095, July 2001.

   [RFC-2581]  Allman, M., Paxon, V., Stevens, W., ôTCP Congestion
               Controlö, RFC 2581, April 1999.

   [RFC-2234]  Crocker D, et al, "Augmented BNF for Syntax
               Specifications: ABNF", RFC 2234, 1997.



Cho & Hazra                                                    [Page 23]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   [RFC-1122]  R. Braden, Editor, ôRequirements for Internet Hosts û
               Communication Layersö, RFC 1122, 1989.

   [ROHC-TCP]  Pelletier, G., Zhang, Q., Jonsson, L-E., Liao, H., West,
               M., "RObust Header Compression (ROHC): TCP/IP Profile
               (ROHC-TCP)", Internet Draft (work in progress), <draft-
               ietf-rohc-tcp-04.txt>, May 2003.

   [TCP-BEH]   West, M. and S. McCann, "TCP/IP Field behavior", Internet
               Draft (work in progress), <draft-ietf-rohc-tcp-field-
               behavior-02.txt>, March 2003.

   [ROHC-CR]   Pelletier, G., "RObust Header Compression (ROHC): Context
               Replication for ROHC Profiles", Internet Draft (work in
               progress),  <draft-ietf-rohc-context-replication-01.txt>,
               October 2003.

   [ROHC-FN]   "Formal Notation for Robust Header Compression
               (ROHC-FN)", R. Price et al., <draft-ietf-rohc-formal-
               notation-01.txt> (work in progress), March 2003

   [EPIC-LITE] Price, R., Hancock, R., McCann, S., Surtees, A., Ollis,
               P., West, M., "Framework for EPIC-LITE", Internet Draft
               (work in progress), <draft-ietf-rohc-epic-lite-01.txt>,
               2002.

   [EPIC-IMPL] L. Vidjak, M. Stula, J. Ozegovic, "Program Structures
               for EPIC-LITE Experimental Implementation", SoftCOM 2002.

   [TCP-WIN]   Cho, C.Y., Hazra, S.K., ôEncoding Method for TCP Window
               in Context Replicationö, Internet Draft, to be submitted.


9.  Authors' Addresses

   Chia Yuan Cho
   Institute for Infocomm Research (I2R)
   21 Heng Mui Keng Terrace
   Singapore 119613

   Phone: +65 6874 6643
   Email: stucyc2@i2r.a-star.edu.sg











Cho & Hazra                                                    [Page 24]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   Sukanta Kumar Hazra
   Institute for Infocomm Research (I2R)
   21 Heng Mui Keng Terrace
   Singapore 119613

   Phone: +65 6874 1953
   Email: sukanta@i2r.a-star.edu.sg














































Cho & Hazra                                                    [Page 25]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


Appendix A.  State Transition Threshold

   The aim of this section is to determine a reasonable range for the
   number of initial TCP/IP packets possibly converted into IR or IR-CR
   packets, which is defined as the State Transition Threshold.

   The compressor state machine controls the type of packet transmitted
   to the decompressor. As elaborated in [ROHC-TCP], transition from the
   CR state to CO state at the compressor is initiated optimistically or
   explicitly through reception of an ROHC ACK from the decompressor.
   Because at least 1 IR/IR-CR packet must be sent before state
   transition, the State Transition Threshold, H is such that H: H >= 1.
   The State Transition Threshold is different from simply the number of
   context initializing IR/IR-CR packets sent because in uni-directional
   mode or optimistic bidirectional mode, a single TCP/IP packet may be
   sent as a number of duplicate IR/IR-CR packets (To allow the
   compressor to gain the optimistism necessary for upwards transition).

   A range of suitable values for H is derived the protocol stack nature
   and channel characteristics. For the TCP/IP protocol stack, we begin
   by looking at the first few packets exchanged for a TCP connection.

   Fig. 4 shows a TCP connection using TCP/IP header compression over a
   low-bandwidth channel. Packets in the forward direction are numbered.
   The first TCP packet is always converted into an IR/IR-CR packet. In
   the following analysis, we focus on the compressor at the client and
   the decompressor at the router.

   Suppose the channel is full-duplex, and an ROHC ACK is sent upon the
   successful decompression of the first packet. ROHC ACKs may be
   piggybacked. The earliest possible ROHC ACK sent is indicated in Fig.
   4 as a dotted arrow. When the compressor receives the ROHC ACK, it
   transits from IR/CR to CO state. Subsequently, it starts sending CO
   packets instead. If the channel is reliable, then the compressor
   receives its ROHC ACK before it sends the second TCP/IP packet and
   only a single TCP/IP packet becomes an IR/IR-CR packet, i.e. H = 1.
   This is also likely if the router-server RTT >> client-router RTT,
   for which case even if the first ROHC ACK is lost, the compressor may
   be offered ample opportunity to receive retransmitted ROHC ACKs
   before it sends the packet #2. Conversely, if the channel is
   unreliable, and/or if client-router RTT >> router-server RTT (as is
   likely the case for cellular links), then it is likely that the ROCH
   ACK is not received immediately and subsequent TCP/IP packets are
   still sent as IR-CR packets. However, as seen from Fig. 4, the time
   lapse between TCP/IP packet #1 and packet #4 is long compared to all
   subsequent packets (when the TCP sliding window mechanism kicks in),
   and it is reasonable to assume that the ROHC ACK is received before
   packet #4 is sent. Thus, a reasonable range is 1 <= H <= 3.





Cho & Hazra                                                    [Page 26]


Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


                  Client    Router   Server
                    |         |         |
                SYN |--- #1   |         |
                    |   ---   |         |
                    |      -->|---      |
                    |      ...|   ---   |
                    |   ...   |      -->|
     +--  ROHC ACK  |<..      |      ---| SYN,ACK
     |  (best case) |         |   ---   |
     |              |      ---|<--      |
     |              |   ---   |         |
     |              |<--      |         |
     |          ACK |--- #2   |         |
     |              |   ---   |         |
     |      request |--- #3-->|---      |
     |              |   ---   |   ---   |
     |              |      -->|---   -->|
     | large        |         |   ---   |
     | time         |         |      -->|
     | lapse        |         |      ---| reply
     |              |         |   ---   |
     |              |      ---|<--      |
     |              |   ---   |         |
     +--(worst case)|<--      |         |
                    |--- #4   |         |
                    |   ---   |         |
                    |      -->|---      |
                    |         |   ---   |
                    |         |      -->|
               Compressor  Decompressor

                    |_________|_________|
                        Low      Wired
                     Bandwidth     or
                      Channel   Wireless

   Fig. 4: TCP handshaking and ROHC ACKs

   Finally, because TCP/IP contains bi-directional traffic, header
   compression may occur in both directions and in this case the overall
   state transition threshold is Ho = 2H. For uni-directional protocol
   stacks like RTP/UDP/IP, the overall state transition threshold Ho
   remains at H.










Cho & Hazra                                                    [Page 27]