Robust Header Compression                               Peter J. McCann
  INTERNET DRAFT                                               Tom Hiller
  Document: draft-mccann-rohc-gehcoarch-02.txt        Lucent Technologies
                                                               June, 2001
  
  
      Requirements and Architecture for Header Stripping and Generation
  
  
  Status of this Memo
  
     This document is an Internet-Draft and is in full conformance with all
     provisions of Section 10 of RFC2026 [Bradner96].
  
     Internet-Drafts are working documents of the Internet Engineering Task
     Force (IETF), its areas, and its working groups. Note that other groups
     may also distribute working documents as Internet-Drafts.
  
     Internet-Drafts are draft documents valid for a maximum of six months
     and may be updated, replaced, or obsoleted by other documents at any
     time. It is inappropriate to use Internet- Drafts as reference material
     or to cite them other than as "work in progress."
  
     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt
  
     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.
  
  
  1. Abstract
  
     Efficient transmission of voice over wireless links requires
     significant engineering effort.  Because of the high cost of bandwidth
     on such links, special techniques for compression of voice data and its
     transmission over the air have been developed.  The compression
     techniques and the wireless physical layers have been co-designed for
     maximum spectral efficiency and human perceptual euphony.
  
     Voice over IP (VOIP) applications should be able to leverage this
     engineering effort when used over wireless links.  We advocate a
     "header stripping and generation" approach to this problem in order to
     enable the end-to-end service model while achieving maximum spectral
     efficiency and simplicity of implementation.  This document outlines an
     architectural framework for a wireless VOIP application, including the
     wireless link layer and its interface to typical IP stack
     implementations, and discusses the protocol elements that should be
     standardized between the various components.
  
  
  
  
  
  
  McCann, Hiller             Expires 08/2001                          1
  
  
                                GEHCOARCH                February, 2001
  
  
  2. Introduction
  
     Voice over IP (VOIP) promises to change radically the way that
     telephony services are built and delivered.  Integration of voice with
     the Internet will not just be a change in the way traffic is carried;
     rather, new types of services will be made possible by the integration
     of voice with existing Internet applications such as the World Wide Web
     and e-mail.  The key to these new services will be a platform that
     offers open programmability while offering a transport for VOIP in an
     integrated, robust, and efficient way.
  
     Wireless links offer great challenges to the transport of voice
     traffic, and significant engineering effort has gone into making them
     efficient for circuit voice applications.  New voice compression
     algorithms ("codecs"), such as EVRC [TIA-IS127], SMV [TIA-SMV], or AMR
     [ETSI-AMR] have been developed to minimize the amount of data that must
     be carried, and special over-the-air channels have been implemented to
     carry these codecs with a minimum of overhead bits and minimal latency.
  
     VOIP flows will be carried inside the Real-Time Protocol (RTP)
     [Shulzrinne96] on wired links.  However, for wireless links, the
     situation is less clear.  The limited bandwidth of wireless links makes
     it impossible to transmit the entire IP/UDP/RTP header with every
     packet, as the overhead would be prohibitive.  It is possible to
     compress these headers by transmitting only updates to the fields that
     change rather than the entire header [Bormann01], but these compression
     schemes are complex and can never entirely eliminate the overhead due
     to RTP.  Even when the header is compressed down to one byte per frame
     on average, the impact on spectral capacity is significant.  Also, the
     variable-sized frames produced by these compression protocols are
     unsuitable for typical wireless links that support only a limited
     number of frame sizes.
  
     The fundamental reason why these schemes cannot achieve the same
     efficiency as circuit data is that they discard information that is
     available at the physical channel layer, including the real-time nature
     of the traffic, which can assist in reconstructing the RTP header.
     This document describes an architectural framework that allows such
     real-time information to be used while not restricting the choice of
     call control protocol, placement of call feature servers, or mobile
     station architecture.
  
     All work to date on header compression has taken as a basic requirement
     that it will operate with no knowledge about the applications
     generating the compressible packets, and therefore when a packet is
     compressed and then decompressed, the result must be bit-for-bit
     identical with the original packet.  However, we argue that many
     applications, especially those that are only concerned with
     transmission and playback of voice, can tolerate some amount of skew in
     the reproduced RTP headers.  When a compressor/decompressor pair can
     make these assumptions, very simple and efficient header compression
  
  McCann, Hiller             Expires 08/2001                          2
  
  
                                GEHCOARCH                February, 2001
  
  
     can be performed.  Our architecture allows applications to indicate
     their ability to tolerate such skew, and we discuss the conditions
     under which applications may do so.  This allows us to implement a form
     of header compression that makes use of existing circuit voice
     implementations with minimal changes; we refer to this approach as
     "header stripping and generation."
  
  
  3. Wireless Technology Considerations
  
     Cellular wireless technologies will support distinct bearer channels
     for real-time audio flows versus non-real-time data.  Data for TCP,
     such as web or e-mail traffic, will suffer from the lossy nature of the
     wireless link unless a link-layer retransmission protocol is used to
     improve its reliability.  Such a retransmission protocol (called the
     Radio Link Protocol or RLP in the emerging cellular data networks) does
     improve reliability but only at the expense of additional buffering and
     latency [Fairhurst01].  Real-time audio streams cannot tolerate the
     additional latency, which could be on the order of 1 second under
     adverse radio conditions.  For this reason, a separate bearer channel
     will be used for voice that does not perform retransmission.  This
     bearer will be very similar to the existing circuit voice channels.
     The architecture outlined below allows the mobile station to make
     effective use of this channel for VOIP.
  
     In addition to the two types of channels outlined above, there are
     likely to be intermediate kinds of channels intended to carry various
     kinds of IP multimedia data.  This data is somewhat more sensitive to
     delay and less sensitive to loss than ordinary packet data, but is
     unable to use the underlying link framing in the same manner as circuit
     voice.  Such traffic will need to be carried in, e.g., HDLC, and will
     be transported over an RLP that does few or no retransmissions to avoid
     introduction of buffering and delay.
  
     Because of these multiple channel types, each endpoint of the link will
     need to know when to establish new channels and how to properly
     allocate packet flows to channels.  Applying heuristics to guess which
     flows should be allocated to which channels will not be acceptable; in
     this case, unlike the heuristics used for bit-wise transparent header
     compression, guessing wrong will do harm to application flows because
     they will not be able to meet their real-time requirements.
  
     The architecture of a mobile station should allow maximum flexibility
     in its hardware and software choices.  Two basic mobile station models
     have been identified in the wireless data community.  A "network model"
     station is one that is completely integrated, such as a phone plus
     browser or a palmtop with integrated radio hardware.  Such a device
     usually has a real-time operating system, a DSP chip for processing the
     audio codec, and an embedded IP stack implementation.  In contrast, a
     "relay model" station is one that is split in two: it consists of a
     piece of terminal equipment (such as a laptop computer) connected to a
  
  McCann, Hiller             Expires 08/2001                          3
  
  
                                GEHCOARCH                February, 2001
  
  
     piece of radio equipment, usually by a serial connection.  The idea is
     to make use of the mostly stock operating system on the terminal
     equipment, while "relaying" the data to and from the wireless network
     via the radio equipment.  We take the point of view that VOIP
     applications must be supported for both kinds of mobile stations.
     While network model phones will offer a tightly integrated set of
     services, relay model stations are likely to offer a much more open and
     programmable environment on the terminal equipment.  As these devices
     evolve we expect the distinction between network and relay models will
     blur as the wireless device moves closer to the UNIX notion of a
     "network interface" to a stock operating system, and the operating
     system evolves to take on more real-time functionality.
  
     Both the relay and network model terminals are endpoints that both host
     applications and terminate the complex wireless link described above.
     In other words, the compressor/decompressor are always operating over
     the last hop link.  This makes it simple for applications to express
     their preferences to the compressor for how their packets should be
     treated; an application can use a local software API for communicating
     with the local compressor, and link-layer signaling can communicate
     these preferences to the remote compressor one hop away.  While we
     expect this will cover the vast majority of cases, there may also be a
     requirement to support a router with this type of wireless link
     interface, such as a phone that is acting as an IP-layer gateway for
     many IP devices carried by a single user.  If voice flows are expected
     to originate from such devices, then new signaling protocols must be
     used to indicate to the (now remote) compressor how packets should be
     treated.  Note that this not only includes the application's preference
     for transparency, but also which kind of underlying wireless channel
     should be used to carry the traffic.  Previous header compression
     schemes have relied on heuristics to recognize which flows are RTP
     traffic; because they were bit-for-bit transparent, they claimed that
     choosing incorrectly did no harm to applications.  Because we relax
     this bit-for-bit transparency requirement, we must be sure that a flow
     belongs to an application that can tolerate skew.  Otherwise, the skew
     could do harm to applications.  However, note that sending a packet
     over an incorrect wireless channel would also do harm, because the
     real-time performance needs of the application will not be met.  We
     presume that some form of IP-layer signaling must be used to inform
     routers how to allocate flows to channels, and we propose that
     application transparency requirements can be carried at the same time.
  
  
  4. Requirements
  
  
     In this section we examine the environment in which zero-byte header
     compression is expected to operate, including the required efficiency,
     assumptions about applications, and concerns about simplicity.
  
  
  
  McCann, Hiller             Expires 08/2001                          4
  
  
                                GEHCOARCH                February, 2001
  
  
  4.1 Efficiency
  
     Approximate voice activity factors (probability distribution of frame
     sizes) for the Selectable Mode Vocoder (SMV) are given in Figure 1.
     These reflect one party's activity during a typical two-way interactive
     voice call.
  
               Rate           Activity %     Payload (bits)
  
               Full              20              171
               Half              20               80
               Quarter           10               40
               Eighth            50               16
  
          Figure 1: Activity of the 3GPP2 Selectable Mode Vocoder
  
     This vocoder is designed to operate synchronously with the underlying
     physical channel: it outputs one of the above frame sizes every 20
     milliseconds.  Which frame size is output depends on the
     characteristics of the speech being compressed; typically, full-rate
     (171 bit) frames are used during active talk spurts, interspersed with
     half- and quarter-rate frames as needed.  Eighth-rate frames are used
     mainly during silence periods, but they also contain information about
     the noise components present in the silence, which is referred to as
     "comfort noise generation".  Also, the physical link typically requires
     that some frame be transmitted during every 20ms interval so that power
     control can be maintained, and the eighth-rate frames play this role.
  
     The cdma2000 air interface has been designed with these frame sizes in
     mind, to support optimal transport of circuit voice.  It is not
     possible to perform a marginal adjustment to the frame sizes to
     accommodate header overhead.  This makes application of the basic ROHC
     RTP profile problematic at best: if one byte of LSB-encoded sequence
     number is added to a frame, it must be carried in the next-higher frame
     format.  For a full-rate frame, there is no next-higher frame format
     and so those frames could not be transported without breaking the
     synchronization with the underlying physical link and introducing
     additional framing, for example with the use of PPP HDLC flags or the
     ROHC segmentation mechanism.  This would introduce another 1 or 2 bytes
     of overhead per frame, and would also have a multiplier effect on the
     frame error rate since most vocoder frames would now span two physical
     frames.  Finally, this lack of synchronization would introduce an
     occasional lag between the vocoded frame time and real time that could
     add to the end-to-end latency and jitter of the RTP flow.
  
     Even a very conservative calculation, assuming these problems can be
     overcome and ignoring the contribution from eighth-rate 16 bit frames,
     yields an additional 400 bits per second from the header and
     segmentation overheads.  Compared to the average 3720 bps circuit voice
     rate, this overhead (greater than 10%) would significantly diminish the
     number of calls that can be handled in a given amount of spectrum.  We
  
  McCann, Hiller             Expires 08/2001                          5
  
  
                                GEHCOARCH                February, 2001
  
  
     conclude that because the codec and physical link have been co-
     engineered to such tight tolerances, we should endeavor to use the
     vocoder/physical link largely unchanged from its existing
     implementation for circuit voice.
  
  
  4.2 Application Assumptions
  
     In order for real-time to serve as a proxy for the RTP sequence number,
     it must be the case that the sequence number increments by one for
     every physical layer epoch.  This would be true if the transmitter
     sends a vocoded frame for every epoch, as is done by the existing
     cdma2000 vocoders even during silence intervals.  Note that in 3G
     systems the mobile node transmits continuously even during silence so
     that the network may monitor power.  Note also that these frames are
     not empty; they do carry information about the background noise
     components during silence, known as "comfort noise".
  
     We explicitly relax the assumption that reconstructed headers at the
     decompressor are bit-for-bit identical to the headers seen by the
     compressor.  Specifically, we note that for most VOIP applications, the
     RTP sequence number and timestamp are primarily used to schedule frames
     for playback over a relatively short interval.  Implementations
     typically maintain a playback buffer of a few frames, and place
     incoming voice samples into that buffer based on their timestamp and
     sequence number.  Based on a running average of the buffer depth,
     frames are discarded or silence is inserted according to whether the
     buffer is too full or is running low, respectively.  Such a playback
     buffer only needs the timestamps and sequence numbers to be relatively
     accurate; that is, over short timescales, neighboring frames should
     have neighboring timestamps and sequence numbers.  Any small, fixed
     skew that is introduced into the packet stream will be quickly
     corrected by the playback buffer mechanism.
  
     However, not every application will be able to tolerate such skew.
     Defining "non-transparent compression" as any compression that changes
     bits end-to-end, we could make the following statements:
  
       1. The end-system MUST be aware of any non-transparent compression.
       2. The end-system MUST be able to turn off non-transparent
          compression if it chooses.
  
     Depending on the application semantics for each header field, we can
     classify that field as follows:
  
     BRITTLE         These fields are those that must be reconstructed by
                     the decompressor so that they match bit-for-bit those
                     seen at the compressor.
  
     PLIANT          These fields are those for which the application can
                     tolerate some form of skew.
  
  McCann, Hiller             Expires 08/2001                          6
  
  
                                GEHCOARCH                February, 2001
  
  
  
     Note that BRITTLE fields can be either STATIC or CHANGING, where those
     terms are used as in appendix A of ROHC [Bormann01].
  
     For simplicity, we assume that all PLIANT fields are CHANGING.  We note
     that STATIC fields are easy to communicate precisely at initialization
     time, so classifying such a field as PLIANT would not ease the
     compression/decompression task.  However, we note that for the specific
     application of wireless vocoders, we can often make stronger
     assumptions about what fields are static.  For example, the Marker Bit
     may never be used for EVRC in wireless applications [Li01].  This lets
     us assume this field is STATIC.
  
     The PLIANT fields can be further classified according to what kinds of
     skew can be tolerated by applications.  For example, a RARELY-CHANGING
     (RC) [Bormann01] field is updated infrequently and thereafter keeps its
     new value.  We note that for most RC fields, it is not mandatory that
     such a change reach the receiver in the exact packet it was changed by
     the sender.  For example, a CSRC list can be updated in a somewhat
     asynchronous manner; if the update is applied a few packets earlier or
     later application semantics will not be affected.  Similarly, a
     SEMISTATIC [Bormann01] field such as one used for congestion
     notification does not need to be precisely synchronized with the
     original packet in which it was set; as long as the receiver gets the
     congestion notice in a reasonable amount of time it can take
     appropriate action.  We refer to such fields as RC-PLIANT or SS-PLIANT.
  
     For fields like the RTP Timestamp and Sequence number, we introduce a
     new term:
  
     OFFSET-PLIANT     These fields may be changed by some offset in the
                       compression/decompression process.  These fields have
                       a STATIC delta and are incremented by that delta with
                       each packet.  The precise offset of the decompressor
                       from the compressor is itself a RARELY-CHANGING
                       value.
  
     Note that some applications may impose semantics on fields that make
     them BRITTLE.  For example, if SRTP is in use [Blom01] the sequence
     number and/or timestamp must be matched precisely to the encrypted,
     vocoded frame.  Also, if RTCP is being used to estimate round-trip
     time, these estimates will be perturbed by the offset amount.
     Applications may be able to tolerate different amounts of offset and it
     may be important in the future to characterize the amount of offset
     introduced by a particular implementation; however, for now we take a
     purely qualitative approach.
  
     Table 1 lists the CHANGING fields from the basic ROHC RTP profile
     [Bormann01].  For each, we give the application assumptions on pliancy
     that must hold for a header stripping/generation approach to preserve
     IP and application semantics.
  
  McCann, Hiller             Expires 08/2001                          7
  
  
                                GEHCOARCH                February, 2001
  
  
  
       +------------------------+-------------+--------------------------+
       |         Field          | Assumption  |  Note                    |
       +========================+=============+==========================+
       | IPv4 Id:    Sequential |   PLIANT    | Start w/initial context  |
       +------------------------+-------------+--------------------------+
       | IP TOS / Tr. Class     |  RC-PLIANT  | Probably never updated   |
       +------------------------+-------------+--------------------------+
       | IP TTL / Hop Limit     |  RC-PLIANT  | Unimportant for last-hop |
       +------------------------+-------------+--------------------------+
       | UDP Checksum: Disabled |   STATIC    | Checksum always disabled |
       +------------------------+-------------+--------------------------+
       |                 No mix |   STATIC    |                          |
       | RTP CSRC Count: -------+-------------+--------------------------+
       |                 Mixed  |  RC-PLIANT  | Update need not be sync. |
       +------------------------+-------------+--------------------------+
       | RTP Marker             |   STATIC    | Disable for EVRC         |
       +------------------------+-------------+--------------------------+
       | RTP Payload Type       |   STATIC    |                          |
       +------------------------+-------------+--------------------------+
       | RTP Sequence Number    |OFFSET-PLIANT|                          |
       +------------------------+-------------+--------------------------+
       | RTP Timestamp          |OFFSET-PLIANT|                          |
       +------------------------+-------------+--------------------------+
       |                 No mix |      -      |                          |
       | RTP CSRC List:  -------+-------------+--------------------------+
       |                 Mixed  |  RC-PLIANT  | Update need not be sync. |
       +------------------------+-------------+--------------------------+
  
          Table 1 : Assumptions on the CHANGING header fields necessary for
                    header stripping and generation.
  
  
     We re-classify the IPv4 Identification field as PLIANT.  We assume that
     IPv4 Identifiers can be generated at the decompressor by incrementing
     from an initial value supplied by the compressor.  RTP packets should
     not be fragmented, and the risk of an IPv4 Identifier collision with
     another fragmented packet should be negligible.  If not, then we assume
     at least that Identifiers are taken from a contiguous range and do not
     need to be encoded with every packet.  Only when a new range of
     Identifiers is chosen would an update need to be sent.  Note that it is
     not important for such Identifiers to be identical to the ones visible
     at the compressor, only that there be no collisions with other,
     fragmented packets.
  
     We re-classify the Traffic Class/TOS field as RC-PLIANT.  We note that
     for a given flow, it will probably never be updated unless Explicit
     Congestion Notification is in use.  ECN bits could be treated as RC-
     PLIANT or SS-PLIANT, based on future study.  It is not clear what the
     benefit of ECN will be for low-bitrate flows such as EVRC; such a codec
     will probably not respond to congestion notification.
  
  McCann, Hiller             Expires 08/2001                          8
  
  
                                GEHCOARCH                February, 2001
  
  
  
     We re-classify the TTL/Hop Limit field as RC-PLIANT.  Note that for
     last-hop links, this field will be constant in the uplink direction and
     its value will be unimportant for the downlink direction, because IP
     forwarding will not be performed.  This field only needs to be updated
     if the header stripping/generation is operating over a non-last hop
     link where there is a potential for routing loops.  Even if there is
     the potential for routing loops, it is not necessary to update the TTL
     in a precisely synchronized way; a strategy of eager decrease/lazy
     increase, for example, would have the desired effect of stopping
     routing loops while not introducing too much update overhead.
  
     We re-classify the RTP Marker bit as STATIC for the applications of
     interest.  The purpose of the Marker bit is to indicate where silence
     may be inserted or removed in case of playback buffer/underflow.  We
     note that wireless codecs will typically have their own methods of
     detecting silence, such as the use of low-rate frames.
  
     We re-classify the CSRC count and values as RC-PLIANT.  We assume that
     CSRCs are updated rarely, if at all, and so these updates can be
     carried over the sister reliable data link to the peer without imposing
     much additional overhead.  There is no need to synchronize them
     precisely with the packet in which they first appear.
  
     Under the above assumptions, all BRITTLE fields are STATIC.  This will
     allow header stripping/generation to work without adversely impacting
     end-to-end semantics.
  
  
  4.3 Simplicity
  
     A major goal of this document is to support transport of voice over
     existing cellular voice channels with little or no changes on the
     supporting radio access equipment.  Allowing a solution to completely
     strip out the header, transmitting only voice data on this channel,
     will significantly aid that goal.  By not imposing any new format
     requirements on the vocoded frames, we allow development of future
     codecs to proceed with maximum flexibility.
  
     The simplicity of the supporting header compression state machine must
     also be considered.  Wireless devices are likely to be limited in both
     power and memory budgets.  Network access servers, while they will be
     implemented on larger footprint equipment, will need to support large
     numbers of attached devices and so scalability is a key issue.  By
     decoupling the header initialization and updates from the synchronous
     voice traffic channel, it may be possible to achieve significant
     simplifications in the header compression protocol state machine.
  
  
  
  
  
  McCann, Hiller             Expires 08/2001                          9
  
  
                                GEHCOARCH                February, 2001
  
  
  5. Reference Architecture
  
     Our reference architecture is shown in Figure 2.
  
  
      Remote VOIP    Other NRT       VOIP       Zero-Byte
       Application      Apps        Control------Control
            \              \         /              |
             \              \       /               |
              +-------------IP Protocol            /
                              Stack               /
                                |                /
                                |               /
       Header Comp/  ------Data Link-----------+           Peer
         Decomp      \        Layer                       System
                      \___      |                            |
                          \     |                            |
        Audio       Codec  \    +-------------->Physical<----+
       Hardware<--->Impl <--+------------------>Channel(s)<--+
  
      Figure 2.  Reference architecture for a system implementing
                 zero-byte header compression.
  
  
     The architecture diagram consists of nine components connected to a
     peer system by a collection of physical channels.  Note that we expect
     zero-byte header compression to be somewhat asymmetric in that it will
     usually be implemented between a mobile station, where the VOIP and
     other applications reside, and a peer network entity that is just a
     data link termination point and a first-hop Internet router.  As such,
     the peer system in the network will likely be missing the audio
     hardware and codec implementation, and may not participate in the VOIP
     control.  Also, the mobile station may not need to actually perform
     header compression and decompression if its codec implementation is
     connected directly to the physical channel, which may be required to
     achieve the desired latency guarantees.
  
     The component named "Zero-Byte Control" would consist of the protocol
     logic used to set up and maintain the zero-byte header compression
     context.
  
     In the following subsections we discuss each of the architectural
     elements in turn.  The next section will discuss the interfaces between
     them.
  
  
  5.1 Non Real-time Components
  
     It is important to distinguish between the real-time and non real-time
     components of Figure 2.  This is especially important for a relay model
     mobile station, as it impacts which elements of stock operating systems
  
  McCann, Hiller             Expires 08/2001                         10
  
  
                                GEHCOARCH                February, 2001
  
  
     can be reused and which must be implemented as new real-time
     extensions.  In this subsection we examine the non real-time
     components.
  
  
  5.1.1 VOIP Control
  
     The VOIP control component is the implementation of the call signaling
     protocol, such as SIP [Handley00] or H.323 [ITU-H323].  We make no
     assumptions on which protocol is used, and we do not require the
     network-side peer system to contain this element.  The mobile station
     will use one of the VOIP signaling protocols to interact with call
     feature servers that could be anywhere on the Internet.
  
     We assume that this component will open network-layer connections and
     will have access to the transport endpoint identifiers for the
     IP/UDP/RTP flow.  However, we do not require this element to actually
     process audio data; it will probably be implemented in user-space and
     could add unpredictable latency to such flows, depending on operating
     system characteristics.
  
  
  5.1.2 Remote VOIP Application
  
     If the RTP generating application is remote from the physical link,
     i.e., there is at least one IP hop separating it from the compressor,
     then it will not have direct access to the zero-byte control component.
     Some network layer protocol must be introduced if it is to take
     advantage of zero-byte header compression.
  
  
  5.1.3 IP Protocol Stack Implementation
  
     We assume that the mobile station implements an IP protocol stack in
     conformance with RFC 1122 [Braden89].  Note that such an implementation
     may not be capable of supporting hard real-time tasks.
  
  
  5.1.4 Data Link Layer
  
     The data link layer is the interface between the IP protocol stack and
     the wireless network device.  For cdma2000, this will be PPP [TIA-
     IS835].  For GPRS, this will be LLC [ETSI-LLC], and for UMTS, this will
     be PDCP [ETSI-PDCP].
  
     For cdma2000, we assume a mostly stock PPP implementation for
     interaction with the physical channels that support data and perform
     retransmission.  However, because the data link layer may not be a hard
     real-time component, we would not require it to be on the audio traffic
     path inside the mobile station.
  
  
  McCann, Hiller             Expires 08/2001                         11
  
  
                                GEHCOARCH                February, 2001
  
  
  
  5.1.5 Zero-Byte Control
  
     The Zero-Byte Control component is responsible for negotiating the use
     of header stripping/generation with the peer system and for setting up
     context information such as the fixed portion of the IP/UDP/RTP header.
     It will interact with the VOIP control component to acquire these
     parameters, and will send them across the data link layer to the peer
     system.  It will also interact with the wireless device (possibly
     through the data link layer) to establish the physical audio channels
     and will identify the channel to be used when sending context
     information to the peer system.
  
  
  5.1.6 Other Non-Real-Time Applications
  
     We expect the terminal equipment to be a general-purpose computer and
     as such will have other applications running.  These applications may
     interact with other components such as the IP protocol stack, but in
     general will not be hard real-time tasks.  These applications must co-
     exist will all the other components.
  
  
  5.2 Real-time Components
  
     Because we make use of the real-time nature of the physical channel,
     several components must be implemented as real-time tasks.  For a
     network model phone, this is similar to existing practice: a tightly
     integrated, real-time operating system on an embedded device schedules
     the audio sampling and playback to coincide with the physical frame
     rate of the wireless link.  For a relay model terminal, we wish to make
     use of the audio hardware on the connected terminal equipment.  This
     may require that the components be implemented using special real-time
     extensions to existing stock operating systems.
  
  
  5.2.1 Audio Hardware
  
     The audio hardware consists of the analog-to-digital (A/D) and digital-
     to-analog (D/A) converters used for sampling and playing back sound,
     along with the analog microphones and speakers.  In a network model
     phone this consists of the integrated equipment that is part of the
     phone.  In a relay model terminal it would be the "sound card" or other
     audio peripheral.
  
  
  5.2.2 Codec Implementation
  
     The codec implementation converts the sampled audio to and from the
     special wireless-specific encoding format.  For a network model phone,
     this encoding is carried out on dedicated Digital Signal Processing
  
  McCann, Hiller             Expires 08/2001                         12
  
  
                                GEHCOARCH                February, 2001
  
  
     (DSP) hardware.  In a relay model terminal, we assume this is performed
     on the general purpose CPU of the terminal equipment.
  
  
  5.2.3 Physical Channel
  
     As mentioned before, there will be at least two physical channels
     supporting the mobile station: one that runs RLP retransmission,
     supporting the latency tolerant data applications; and another that
     resembles a voice circuit.  VOIP control signaling will traverse the
     data-oriented RLP channel, while the voice bearer traffic will traverse
     the real-time circuit-like channel.
  
     Both channels must be available to the upper layers regardless of
     whether a relay model or network model terminal is used.  The voice
     channel supports real-time traffic and performs no buffering.  It will
     send a frame at precise, periodic intervals, such as 20 milliseconds
     for cdma2000.  The codec implementation must be able to supply frames
     for the physical channel at exactly this rate.
  
  
  5.2.4 Header Compression/Decompression
  
     The codec implementation may be directly connected to the physical
     channel on the mobile terminal side, and so concrete IP/UDP/RTP headers
     may not necessarily appear inside the mobile terminal.  However, we do
     not prohibit a mobile terminal from reconstructing such headers if it
     requires them.  This component is drawn next to the data link layer in
     the diagram, and may in fact be integrated into the data link layer
     implementation.  It is responsible for classifying each packet coming
     down from the IP protocol stack against the fixed IP/UDP/RTP header
     fields we are attempting to compress.  The value of these fields is
     established by the Zero-Byte Control component and installed into the
     header compression component, possibly via the data link layer.  Once
     the header has been stripped this component must schedule the payload
     for transmission on the physical layer at the appropriate frame
     interval, according to the sequence number and timestamp received in
     the header.
  
     In the opposite direction, when packets arrive on the network side from
     the physical channel, this component is responsible for regenerating
     the proper IP/UDP/RTP header and passing the packet on to the IP
     protocol stack.  It makes use of the physical arrival time to generate
     the proper timestamp and sequence number in the RTP header.
  
     Because the header compression/decompression component is sending and
     receiving packets from the IP protocol stack, it may be implemented as
     a soft real-time component.  However, it must interact with the
     physical voice channel, which is a hard real-time component, both to
     properly record the frame arrival time and to schedule outgoing packets
     for transmission.  If the header compression/decompression is
  
  McCann, Hiller             Expires 08/2001                         13
  
  
                                GEHCOARCH                February, 2001
  
  
     implemented in a separate network element from the physical channel, as
     is likely to be the case in the emerging cellular architectures [TIA-
     IS835], then this interaction could be accomplished with the proper use
     of sequence numbers on the interfaces between them so that each
     physical frame carries the information about when it arrived or when it
     is to be transmitted.
  
  
  6. Interfaces
  
     In this section we examine the interfaces between the above components.
     We distinguish between those interfaces that should be implemented as
     protocols, suitable for standardization in the IETF or elsewhere, and
     those that should remain Application Programming Interfaces (APIs) that
     may or may not need to be standardized.
  
  
  6.1 Protocol Reference Points
  
     In terms of new protocols, the interfaces that need to be standardized
     are listed below.  Some of these interfaces are opportunities for IETF
     protocols, while others should be carried out by other standards-
     setting organizations.
  
  
  6.1.1 Zero-Byte Control to Data Link Layer
  
     The Zero-Byte control component needs to negotiate the use of header
     stripping/generation with its peer and convey the static portion of the
     IP/UDP/RTP header to the peer.  This should be done in such a way that
     the network side is not required to participate in the VOIP control
     protocol.  This means the network side depends on the mobile station to
     inform it what are the RTP flows that should be classified by the
     header compression component as appropriate for sending over the
     physical voice channel.  Rather than create a new network-layer
     protocol, we advocate using new data link messages between the two
     systems to convey this information.
  
  
  6.1.2 Data Link Layer to Physical Channel
  
     Mobile terminals running PPP will typically generate an octet stream
     that is appropriate for an underlying physical channel running RLP.
     However, prior to running PPP the mobile terminal must take steps to
     establish the channel.  Also, we require that the terminal be able to
     dynamically establish and release the voice channels used for real-time
     audio.  For a network model phone this may be supported by APIs within
     the phone, but for a relay model terminal this signaling needs to be
     carried out across a serial port.  Such signaling is usually the
     provenance of a modem control protocol ("AT commands") and
     standardization is probably best carried out in the International
  
  McCann, Hiller             Expires 08/2001                         14
  
  
                                GEHCOARCH                February, 2001
  
  
     Telecommunications Union (ITU).  Note that in addition to the usual
     signaling to establish and release channels, we also need to obtain
     identifying information for each channel.  This information will be
     used by the Zero-Byte control component to communicate the initial
     timestamp and sequence number offsets to the peer.  It must be possible
     to signal this information during a running PPP session.
  
     Additional real-time information from the physical channel may improve
     the header compression.  For example, if the precise activation time of
     the channel is known and can be correlated with the RTP packet flow,
     the compressor could initialize and communicate the precise RARELY-
     CHANGING offset to the decompressor.  Precise information about other
     events that affect the offset, such as handoffs, buffer over/underflow,
     or clock drift between the physical channel and internal RTP timestamp,
     would also be useful.  If properly engineered this would allow for even
     OFFSET-PLIANT fields to be accurate most of the time, which could allow
     applications like SRTP to function adequately.
  
  
  6.1.3 Physical Channel to Codec or Header Compression/Decompression
  
     As stated above, the physical channel could interface directly to the
     codec implementation on the mobile station side and to a header
     compression/decompression process on the network side.  For a network
     model phone, the codec interface may be a proprietary API.  However,
     for a relay model terminal, we must standardize a new way to transport
     the frames across a serial connection in real-time.  This will require
     that we multiplex the real-time frames with the non-real-time data for
     PPP.  This multiplexing could be carried out with the use of escape
     characters on the serial interface; again, this work is probably best
     carried out within the ITU.  Any new special characters would need to
     be properly inserted into the ACCM of the PPP implementation.
  
     On the network side, the physical voice channel may be separated from
     the header compression/decompression process by an IP network.  If this
     is the case then each physical frame must carry a sequence number that
     indicates the exact frame time that it was received or is to be
     transmitted over the air.  Standardization of such interfaces is best
     carried out within the 3rd Generation Partnership Projects (3GPPs).
  
  
  6.1.4 Remote VOIP Application to Zero-Byte Control
  
     If the RTP generating application is remote from the wireless link,
     i.e., there is at least one IP hop separating it from the compressor,
     then it will not have direct access to the zero-byte control component.
     Some network layer protocol must be introduced if it is to take
     advantage of zero-byte header compression.  This protocol could be
     similar to the hints that have been introduced into RSVP [Davie00],
     although we note that the flow specifications in RSVP are not likely to
  
  
  McCann, Hiller             Expires 08/2001                         15
  
  
                                GEHCOARCH                February, 2001
  
  
     be flexible enough to specify packet flows that contain layers of
     encapsulation.
  
  
  6.2 API Reference Points
  
     Other interfaces between the components are best done as Application
     Programming Interfaces (APIs) and may or may not need to be
     standardized.  In any case we do not advocate the standardization of
     APIs within the IETF and we discuss these interfaces for illustration
     purposes only.
  
  
  6.2.1 VOIP Control to Zero-Byte Control
  
     The VOIP control component is responsible for end-to-end VOIP signaling
     such as SIP [Handley00] or H.323 [ITU-H323].  We expect these
     applications to be implemented by many different people and to use
     standard operating system interfaces.  Also, these applications should
     work the same way when used in wireless or wireline settings, except
     that the codecs should be tailored for the specific link layer
     currently in use.
  
     When used over wireless links, applications may want to make use of the
     optimized real-time path outlined above (audio hardware to codec to
     physical channel) rather than taking audio data into user space,
     performing a user space codec transformation, constructing RTP packets,
     and writing them to a standard UDP socket.  Such user space
     manipulation of audio traffic could introduce unpredictable latency to
     the flow, depending on the operating system characteristics.
  
     To enable the optimized real-time path, the VOIP control protocol
     should signal to the Zero-Byte control component that it has completed
     VOIP signaling and is ready to begin audio bearer flow.  This signal
     might be a system call containing the IP/UDP/RTP parameters that have
     been negotiated and the codec to be used.  This system call would be a
     one-line addition to existing VOIP client implementations.
  
  
  6.2.2 Zero-Byte Control to Real-time Path
  
     When the Zero-Byte control component receives a signal from the VOIP
     control component that the VOIP signaling has been completed, it must
     take the following steps:
  
       1) Open the new physical voice bearer channel;
  
       2) Send the peer system information about the flow, including the
          static header fields and identification of the physical bearer
          channel; and, finally,
  
  
  McCann, Hiller             Expires 08/2001                         16
  
  
                                GEHCOARCH                February, 2001
  
  
       3) Trigger the audio hardware to begin sampling, and the codec
          implementation to begin encoding/decoding.
  
     The first step could be accomplished via an interface to the data link
     layer, or may be accomplished directly.   In the second step, the
     existing PPP connection is used to inform the peer what header fields
     should be attached to the synchronous voice frames, beginning with any
     convenient nearby starting point, such as the first frame received.
     The third step requires interaction with the real-time components such
     as the audio hardware and codec implementation, to enable the real-time
     data to start flowing.
  
     If the additional real-time channel information is available concerning
     establishment time, handoff, clock drift, and buffer over/underflow,
     then additional features could be implemented to improve the
     transparency of the scheme.  Whenever an event takes place that
     requires re-synchronization of the compression state, such as a
     physical layer reset (hard handoff) or sequence number slippage due to
     clock drift, the Zero-Byte control component would update its peer with
     the appropriate state.  This update should include an offset,
     calculated from the time the channel was established or reset,
     indicating to which physical layer frame the update applies.  Such
     offset-indicating updates should also be sent when any of the normally
     static header fields, such as TTL, TOS, or CSRCs change.  This will
     enable completely transparent decompression of RTP header fields for
     most packets.
  
  
  6.2.3 Header Compression/Decompression to Data Link Layer
  
     The header compression component must classify all traffic from the IP
     protocol stack as to whether it is part of the RTP flow that needs to
     be sent on the voice physical channel.  Because it must examine each
     packet, it will probably be fairly tightly integrated with the data
     link layer.
  
     The header decompression component produces IP packets from the
     physical voice frames and sends them up the IP protocol stack.  Getting
     packets to the IP protocol stack may be implemented by passing the
     packets through the data link layer.
  
  
  6.2.4 Other Interfaces
  
     The mobile terminal potentially will be executing many simultaneous
     applications and we expect all of the standard interfaces (network
     sockets, GUI) to be present.  Note that ordinary applications may want
     to use the audio hardware at the same time as a voice call is in
     progress.  This could be disallowed, or a special "audio mixer" process
     could be introduced between the audio hardware and the codec
     implementation to allow such simultaneous access.  For example, a
  
  McCann, Hiller             Expires 08/2001                         17
  
  
                                GEHCOARCH                February, 2001
  
  
     system beep noise might be mixed into the telephone call in such a way
     that only the mobile terminal user would hear it.
  
     Much ado has been made about the proper reconstruction of the IP
     Identification field for each RTP packet.  We note that RTP payloads
     are required to stay within the path MTU [Handley99] and should never
     experience fragmentation.  However, in order to avoid any possibility
     of Identification field collision with other packets that may be
     fragmented, a new interface could be implemented between the Zero-Byte
     control and the IP protocol stack to "reserve" a range of
     Identification values for use by the RTP flow.  If the header
     decompression component always increments the Identification field by
     one for each reconstructed header, and wraps around to the beginning
     when the range is about to overflow, then no additional work is
     necessary to ensure uniqueness of IP Identification fields.
  
  
  7. Conclusions
  
     This draft has presented an architecture for zero-byte header
     compression and its implications for both a mobile station and the
     supporting network.  On the network side, with this architecture the
     peer in the network does not need to be aware of the VOIP control
     between the mobile and a SIP/H323 server that could be anywhere in the
     network.  When the header compression/ decompression is performed in a
     network element that is physically separated from the physical channel
     (e.g. a PDSN from 3GPP2 [TIA-IS835]), the hard real-time requirements
     on this element can be alleviated through the proper use of sequence
     numbers on its interface to the radio channel elements.
  
     On the mobile side, this draft provides high level requirements for
     support of zero-byte header compression in the form of protocol
     interfaces and APIs.  Both monolithic network style mobiles as well as
     relay phone mobiles with laptops are discussed.  Proper architecture of
     the mobile station allows the segregation of hard real-time processing
     from the non-real-time IP stack and applications.  Furthermore,
     convergence of wireline and wireless applications is a long-standing
     goal in the wireless industry.  This architecture allows mobile end
     systems to run VOIP based applications developed for wireline access to
     operate in the wireless environment (although with wireless-specific
     codecs). The impact on VOIP applications could be as little as one line
     of code in the VOIP client itself.
  
     Finally, the draft has outlined protocol work items suitable for the
     IETF as well as external standards bodies, including the ITU and 3rd
     Generation Partnership Projects.  Any necessary APIs could be
     standardized by a collaboration between operating system vendors (open
     source or otherwise) and third party application developers, driven by
     wireless service providers.
  
  
  
  McCann, Hiller             Expires 08/2001                         18
  
  
                                GEHCOARCH                February, 2001
  
  
  8. References
  
     [Bormann01]    Bormann, C. (ed.), "RObust Header Compression (ROHC),"
                    RFC 3095, March 2001.
  
     [Braden89]     Braden, R. (ed.), "Requirements for Internet Hosts --
                    Communication Layers," RFC 1122, October 1989.
  
     [Bradner96]    Bradner, S., "The Internet Standards Process, Revision
                    3," RFC 2026, October 1996.
  
     [Davie00]      Davie, B., Iturralde, C., Oran, D., Casner, S.,
                    Wroclawski, J., "Integrated Services in the Presence of
                    Compressible Flows," RFC 3006, November 2000.
  
     [ETSI-AMR]     European Telecommunications Standards Institute,
                    "Adaptive Multi-Rate (AMR) Speech Transcoding," 3G TS
                    26.090, February 2000.
  
     [ETSI-LLC]     European Telecommunications Standards Institute, GSM
                    04.64.
  
     [ETSI-PDCP]    European Telecommunications Standards Institute, 3G TS
                    25.323.
  
     [Fairhurst01]  Fairhurst, G., and Wood, L., "Link ARQ Issues for IP
                    Traffic," draft-ietf-pilc-link-arq-issues-01.txt, March
                    2001.  Work In Progress.
  
     [Handley99]    Handley, M., and Perkins, C., "Guidelines for Writers of
                    RTP Payload Format Specifications," RFC 2736, December
                    1999.
  
     [Handley00]    Handley, Schulzrinne, Schooler, Rosenberg, "SIP: Session
                    Initiation Protocol," draft-ietf-sip-rfc2543bis-01.txt,
                    August 2000.  Work In Progress.
  
     [ITU-H323]     International Telecommunications Union, "Packet Based
                    Multimedia Communications Systems," ITU-T Rec. H.323,
                    September 1999.
  
     [Li01]         Li, A., (editor), "An RTP Payload Format for EVRC
                    Speech," draft-ietf-avt-evrc-03.txt, May 2001.  Work In
                    Progress.
  
     [Shulzrinne96] Schulzrinne, H., Casner, S., Frederick, R., and
                    Jacobson, V., "RTP: A Transport Protocol for Real-Time
                    Applications," RFC 1889, January 1996.
  
     [TIA-IS127]    Telecommunications Industry Association, "Enhanced
                    Variable Rate Codec, Speech Service 3 for Wideband
  
  McCann, Hiller             Expires 08/2001                         19
  
  
                                GEHCOARCH                February, 2001
  
  
                    Spread Spectrum Digital Systems," TIA/EIA/IS-127,
                    February 1997.
  
     [TIA-IS835]    Telecommunications Industry Association, "Wireless IP
                    Network Standard," TIA/EIA/IS-835, June 2000.
  
     [TIA-SMV]      Telecommunications Industry Association, "Selectable
                    Mode Vocoder Service Option for Wideband Spread Spectrum
                    Communication Systems," TIA PN4575, 3GPP2 C.P9001, 1997.
  
  
  9. Authors' Addresses
  
     Peter J. McCann
     Lucent Technologies
     Rm 2Z-305
     263 Shuman Blvd
     Naperville, IL  60566-7050
     USA
  
     Phone: +1 630 713 9359
     FAX:   +1 630 713 4982
     EMail: mccap@lucent.com
  
     Tom Hiller
     Lucent Technologies
     Rm 2F-218
     263 Shuman Blvd
     Naperville, IL  60566-7050
     USA
  
     Phone: +1 630 979 7673
     FAX:   +1 630 979 7673
     EMail: tom.hiller@lucent.com
  
  
  Acknowledgements
  
     Thanks to Paul Francis for some of the terminology and concepts
     introduced in Section 4.2.
  
  
  
  
  
  
  
  
  
  
  
  
  McCann, Hiller             Expires 08/2001                         20
  
  
                                GEHCOARCH                February, 2001
  
  
  Intellectual Property Statement
  
     The IETF takes no position regarding the validity or scope of any
     intellectual property or other rights that might be claimed to pertain
     to the implementation or use of the technology described in this
     document or the extent to which any license under such rights might or
     might not be available; neither does it represent that it has made any
     effort to identify any such rights.  Information on the IETF's
     procedures with respect to rights in standards-track and standards-
     related documentation can be found in BCP-11.  Copies of claims of
     rights made available for publication and any assurances of licenses to
     be made available, or the result of an attempt made to obtain a general
     license or permission for the use of such proprietary rights by
     implementers or users of this specification can be obtained from the
     IETF Secretariat.
  
     The IETF invites any interested party to bring to its attention any
     copyrights, patents or patent applications, or other proprietary rights
     that may cover technology that may be required to practice this
     standard.  Please address the information to the IETF Executive
     Director.
  
  
  Full Copyright Statement
  
  
     Copyright (C) The Internet Society (2001). All Rights Reserved. This
     document and translations of it may be copied and furnished to others,
     and derivative works that comment on or otherwise explain it or assist
     in its implementation may be prepared, copied, published and
     distributed, in whole or in part, without restriction of any kind,
     provided that the above copyright notice and this paragraph are
     included on all such copies and derivative works. However, this
     document itself may not be modified in any way, such as by removing the
     copyright notice or references to the Internet Society or other
     Internet organizations, except as needed for the purpose of developing
     Internet standards in which case the procedures for copyrights defined
     in the Internet Standards process must be followed, or as required to
     translate it into languages other than English.
  
     The limited permissions granted above are perpetual and will not be
     revoked by the Internet Society or its successors or assigns.
  
     This document and the information contained herein is provided on an
     "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
     TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT
     NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL
     NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR
     FITNESS FOR A PARTICULAR PURPOSE.
  
  
  
  
  
  McCann, Hiller             Expires 08/2001                         21