Internet Engineering Task Force                        AVT Working Group
Internet Draft                                J.Rosenberg, H.Schulzrinne
draft-ietf-avt-muxissues-00.txt                     Bell Labs/Columbia U.
October 1, 1998
Expires: March 1999


                Issues and Options for RTP Multiplexing

STATUS OF THIS MEMO

   This document is an Internet-Draft. Internet-Drafts are working docu-
   ments of the Internet Engineering Task Force (IETF), its areas, and
   its working groups.  Note that other groups may also distribute work-
   ing documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference mate-
   rial or to cite them other than as ``work in progress''.

   To learn the current status of any Internet-Draft, please check the
   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
   ftp.isi.edu (US West Coast).

   Distribution of this document is unlimited.


                                 ABSTRACT


        This memorandum discusses the issues and options involved
        in the design of a new transport protocol for multiplexed
        voice within a single packet. The intended application is
        the interconnection of devices which provide trunking or
        long distance telephone service over the Internet. Such
        devices have many voice connections simultaneously between
        them. Multiplexing them into the same connection improves
        on the efficiency, enables the use of low bitrate voice
        codecs, and improves scalability. Options and issues con-
        cerning timestamping, payload type identification, length
        indication, and channel identification are discussed. Sev-
        eral possible header formats are identified, and their
        efficiencies are compared.
J.Rosenberg, H.Schulzrinne                                    [Page 1]


Internet Draft               RTP Mux Issues              October 1, 1998

1 Introduction


   Internet telephone gateways (ITGs) allow a public switched telephony
   user (PSTN) user to contact another PSTN user, with the long distance
   portion of the call routed over the Internet. Such a scenario is
   depicted in Figure 1.


        ~~~~~~~~    -------     ~~~~~~~~~~     -------    ~~~~~~~~
   A --|        |  |       |   |          |   |       |  |        |-- C
       |  PSTN  |--|  ITG  |---|  IP NET  |---|  ITG  |--|  PSTN  |
   B --|   X    |  |   J   |   |          |   |   K   |  |   Y    |-- D
        ~~~~~~~~    -------     ~~~~~~~~~~     -------    ~~~~~~~~



   Figure 1: Internet telephony gateway architecture

   Subscribers A and B connect to ITG J via their local telephone net-
   work, X. A wishes to speak with user C, and B wishes to speak with
   user D, both of which are connected to local phone network Y.  To
   complete the call, ITG J packetizes and transports the voice to and
   from A and B through the IP network, to remote gateway K. There, ITG
   K completes the calls to C and D through PSTN Y. This type of
   arrangement and common destination may be particularly common for
   connecting the PBXs of corporate branch offices across the Internet.

   In this scenario, ITGs J and K act as Internet hosts, which are
   effectively proxies for the telephone users connected to them. Unlike
   typical Internet telephony, however, their will often be multiple
   active calls between a pair of gateways, each representing a differ-
   ent pair of users. Gateways can signal calls using SIP [1], H.323 or
   proprietary signalling protocols. Media data is transported via a
   separate RTP [2] session for each user.

   We observe that using a separate RTP session for each user connected
   between a pair of gateways is wasteful. Rather, it would be more
   efficient to multiplex users between a pair of gateways into a single
   RTP session. A number of proposals have been made for RTP extensions
   to accomplish this multiplexing, [3][4] [5].

   This memo discusses some of the issues and options for multiplexing
   users within RTP between a pair of gateways. There are other applica-
   tions for RTP multiplexing, such as transport of RTP in a switched
   RTP network, depicted in Figure 2. In this scenario, an entity which
   we call in RTP Switch, receives some number of RTP muxed connections.
   It extracts the multiplexed payloads from each of the received multi-
   plex streams, switches the payloads, and generates a new set of RTP
   Multiplexed streams. These streams may be destined for other RTP
   Switches, or for telephony gateways.


J.Rosenberg, H.Schulzrinne                                    [Page 2]


Internet Draft               RTP Mux Issues              October 1, 1998


   The switched scenario allows better network utilization. By allowing
   RTP multiplexing only between pairs of gateways, there is an effec-
   tive full mesh RTP network, with the number of multiplexed users
   between a pair of gateway potentially growing small with a large num-
   ber of gateways. An RTP Switched network would allow for greater mul-
   tiplexing. However, it comes at the significant cost of management,
   dynamic routing, and central point of failure requirements.

   These scenarios have differing requirements. In this document, we
   focus on the gateway to gateway case in Figure 1.


                -------                             --------
              |       | RTPMux  ---------  RTPMux |        |
              |  ITG  |--------|         |--------|  ITG 3 |
              |   1   |        |         |         --------
               -------         |         | RTPMux  --------
               -------         |  RTP    |--------|        |
              |       | RTPMux | Switch  |        |  ITG 4 |
              |  ITG  |--------|         |         --------
              |   2   |        |         | RTPMux  --------
               -------         |         |--------|        |
                               |         |        |  ITG 5 |
                                 ---------          --------



   Figure 2: Internet telephony gateway architecture

2 Terminology

     oUser: One of the individuals who has data within the RTP packet.

     oConnection: The point to point RTP session between two ITGs.

     oChannel: A virtual connection which is established by allowing a
      user to send data within a packet. There are many channels per
      connection - this represents the multiplexing.

     oChannel Identifier: A number which identifies a channel.

     oBlock: The section of the payload of a packet which contains data
      for a particular user.

3 Requirements

   The transport protocol must provide, at a minimum, the following
   functionality:


J.Rosenberg, H.Schulzrinne                                    [Page 3]


Internet Draft               RTP Mux Issues              October 1, 1998


     1.   Delineation: Data from different users must be clearly delin-
          eated.

     2.   Identification. The channel to which the data belongs must be
          identified.

     3.   Variable lengths: The protocol should support variable length
          blocks from a particular user. This allows for variable rate
          codecs and adjustment of packetization delays.

     4.   Low overhead: Since the protocol is designed for low rate
          voice, it should have low overhead. This issue is extremely
          important. New coders are emerging which can support near toll
          quality at 5.3 kbps, and acceptable quality at rates even as
          low as 4 kbps. It is desirable to support such codecs, as they
          can reduce the cost of providing an ITG service. Furthermore,
          advances in coding technology indicate that it is desirable to
          send very low bitrate information (1 kbps or less) during
          silence periods, so that background noise can be reproduced
          well (as opposed to sending nothing). Support of such rates
          requires a protocol with low overhead.

     5.   Marker: A general purpose marker bit should be available for
          all users within the connection.

     6.   Payload Identification. The codec in use for each user should
          be indicated somehow. It is a requirement to allow for the
          coding type to change during the lifetime of a channel.

4 Issues

   The following section identifies a number of issues which have an
   impact on the design of the protocol. It also identifies a variety of
   options for providing the specific services of the protocol.

4.1 Payload type identification

   There are a number of ways to identify the coding of the payload.
   They include in-band static types, in-band dynamic types, or out-of-
   band. The in-band approaches are based on some kind of payload type
   identifier, the semantics of which are either known apriori (static),
   or signaled ahead of time (dynamic). The out of band techniques sig-
   nal a binding between the channel identifier and a coder at the
   beginning (or even during) the lifetime of the connection.

   With out-of-band signaling, synchronizing the signaling with the
   media stream is a major issue. The synchronization can be accom-
   plished with either timestamps of sequence numbers.


J.Rosenberg, H.Schulzrinne                                    [Page 4]


Internet Draft               RTP Mux Issues              October 1, 1998


   One approach to performing the synchronization is as follows: The
   source sends a message reliably to the receiver, indicating that it
   will change codings at timestamp N, where N is some future timestamp
   (or SN). The N should be chosen far enough into the future to guaran-
   tee that the receiver will get the TCP message before time N. The
   farther away N is, the more robust the system becomes, but the source
   also loses its ability to adapt quickly. There are also several
   options for simple in- band signaling methods which can assist in
   error recovery. This is based on the assumption that it is better for
   the receiver to know that the encoding has changed (even though it
   doesn't know to what), than to know nothing. This avoids playing gar-
   bage out. A one or two bit coding sequence number can be used in the
   header. Such a number starts at zero. At the timestamp where the
   encoding changes, the SN increments, and stays incremented until the
   next change. In this fashion, we are guaranteed that the source will
   never play out data using the wrong coding type. Probably just one or
   two bits of this SN is necessary.

   Using in-band payload types allows the coding to be explicitly indi-
   cated for each packet. This eliminates synchronization problems,
   allows the sender to change encodings without out of band signaling.
   Its flexibility is the reason in-band payload types were used for
   generic RTP in the first place. By using dynamic types, the number of
   bits for the encoding can be reduced by limiting the number of codecs
   that can be used simultaneously during a session.

   Our conclusion is that it is desirable to have the PTI field in the
   payload (ie, in-band). This makes it possible to do more robust rate
   control, which becomes a significant issue when multiple connections
   are multiplexed together (and therefore the aggregate bitrate
   increases). It also makes sense to signal a table of encodings for
   the payload type at the beginning of the connection. Any particular
   pair of ITG will generally only support a few codecs. Therefore,
   dynamically setting the codings of the PTI bit makes a more compact
   representation possible without restricting the set of codecs which
   may be used.

4.2 Timestamps

   Timing is a very complex issue for the multiplexing protocol. The
   first question related to it is whether the protocol will support
   mixing of media derived from separate clocks (i.e., voice and video).
   Although doing this seems attractive, it is complex and in opposition
   to the philosophy under which RTP was developed. RTP explicitly
   states that separate media should be placed in separate RTP streams.
   This allows for different QoS to be requested for each media, and for
   clocks to be defined based on the media type. Furthermore, this pro-
   file is geared towards the aggregation of voice traffic generated


J.Rosenberg, H.Schulzrinne                                    [Page 5]


Internet Draft               RTP Mux Issues              October 1, 1998


   from the POTS across the Internet. As a result, the only source of
   data is from a single, 125us clock.

   The next basic question is whether timestamps are needed globally,
   i.e., just one per packet independent of the number of users, or
   locally, whereby each user within a packet needs their own timestamp.
   A separate question is the representation of these timestamps in an
   efficient manner. When considering these questions, the criteria to
   keep in mind are:

     1.   Can silence periods be recovered correctly

     2.   Can resynchronization occur in the face of packet loss

     3.   What is the impact on playout buffering and jitter computation

The answer to this question depends on the desired capabilities of the
protocol. In the most general case, it is possible to have different
frame sizes for each user (for example, 20ms, 10ms, and 15ms) within the
same packet. These frames can be arbitrarily aligned in time with
respect to each other (i.e., the 20ms frame starts 5.3 ms after the
beginning of another user's 10 ms frame). The user can send packets off
at any point, containing data from those users whose frames have been
generated before the packet departure time. A somewhat more restrictive
capability is to allow for different frame sizes and time alignments,
but to require that any packet contains all the same frame sizes, all
aligned in time. The most restrictive case is to require separate RTP
sessions for users with different frame sizes. This requires a channel
to be torn down and re-setup when it changes codec. The desire to per-
form flow control on a channel-by- channel basis makes this approach
unacceptable, and it is not considered further.

4.2.1 General Case

   First consider the general case. Packets can contain frames from some
   or all of the users, and those frames are not the same length nor
   time aligned in any way. An example of such a scenario is depicted in
   Figure 3. In the figure, there are three sources, and the ti corre-
   spond to the times of packet emissions. When packets are lost, the
   variability in the amount and time alignment of data in each packet
   makes it impossible to reconstruct how much time had elapsed based
   solely on sequence numbers (such reconstruction IS possible in the
   single user case). Furthermore, the amount of time elapsed can easily
   vary from user to user, and therefore local timestamps are needed.

   The general case introduces further complications which have to do
   with jitter and delay computation. Such computations are needed for
   RTCP reporting and possibly for the estimation of network delays,


J.Rosenberg, H.Schulzrinne                                    [Page 6]


Internet Draft               RTP Mux Issues              October 1, 1998


   used in dynamic playout buffers. In the single user case, the jitter
   is computed between each packet as:

   D(i,j) = (Rj - Ri) - (Sj - Si)

   Where the Ri correspond to the reception times at the receiver mea-
   sured in RTP time, and the Si are the RTP timestamps in the data
   packets. The delay is computed as the difference between the arrival
   time at the receiver and generation time, as indicated by the RTP
   timestamp.

   In the multiple user case, these definitions no longer make sense, as
   there is no single RTP timestamp any longer. Each arriving packet
   will have a single arriving time (Ri), but multiple sending times
   (Si,j) for each block j in the ith packet. There are a number of
   alternatives for delay and jitter computation in this case: compute
   such information for all users, compute such information for a single
   user, or generate a single delay and jitter estimate, but have it be
   based on information from all users. There are pros and cons to each
   approach.

   First of all, it is possible for different blocks to experience dif-
   ferent delays (and jitters) even though they are within the same
   packet. This is because the general scenario allows for significant
   variability, whereby blocks may either vary in size from packet to
   packet and within a packet, or not be transmitted immediately after
   their completion (the latter happens to source B in Figure 3). Thus,
   it is arguable they it may be desirable to perform adaptive playout
   buffering separately for each user, which would require the storage
   and computation of delays for each user.

   The second alternative is to compute the delays for a single user,
   and use that information to size all of the other playout buffers.
   This may be sub-optimal in terms of delay and loss, depending on what
   fraction of the total delay and jitter are introduced by the packeti-
   zation itself. There is a second disadvantage to this approach, how-
   ever. When that particular user enters a silence period, delay and
   jitter information is no longer being received, and so estimates of
   network delay stop adapting. This implies that delay estimates will
   be old for certain periods of time. An alternative is to change the
   user from which delay and jitter estimates are being collected.

   The third alternative is to compute delay estimates based on some
   measure derived from all of the users. There are several reasonable
   approaches. For example, the delay estimate can be computed as:

   Delay = maxj, Ri - Si,j



J.Rosenberg, H.Schulzrinne                                    [Page 7]


Internet Draft               RTP Mux Issues              October 1, 1998


   which would yield a conservative estimate of the delay for some
   users. This approach requires storage of only a single set of delay
   information, although computation still grows with the number of
   users in a packet.



                   ---------------------------------
         Source A |           |           |
                   ---------------------------------
         Source B |    |    |    |    |    |    |
                   ---------------------------------
         Source C |     |     |     |     |     |
                   ---------------------------------

                        |   | |  |    |   ||    |
                       t1  t2t3 t4   t5  t6t7   tt8

            -------------------------------------- time                           /




   Figure 3: Global Timestamp Problem

   Sending local timestamps also requires extra bits in the block head-
   ers. It is possible, however, to use offsets for the local times-
   tamps. A global timestamp can be used in the RTP header (the field
   already exists), and each user has a modifier to indicate position in
   time relative to that timestamp.

   A related question is how big to make the offset field. This offset
   is bounded by the difference in time between the earliest and latest
   samples within a packet. Clearly, this itself is bounded by the pack-
   etization delay at the source. For this application, if we assume a
   125us sample clock, and bound packetization delays to 100ms, the off-
   set field is bounded by 800 ticks, requiring 10 bits.

4.2.2 More Restrictive Case

   As a more restrictive case, we allow blocks to be present in a packet
   if their frame sizes are identical and aligned in time. Note that
   this does not imply identical codecs or identical block sizes in
   terms of bytes; many voice codecs operate with a 20ms or 50ms frame
   size. This case would allow all frame sizes of the same size and time
   alignment, independent of the codec, into a packet.

   This simplifies the timing issue tremendously. Now, the scenario is


J.Rosenberg, H.Schulzrinne                                    [Page 8]


Internet Draft               RTP Mux Issues              October 1, 1998


   much more like the single user application. The sequence numbers and
   the frame size completely determine the timing when at least one user
   is active. But, when all users enter silence, a global timestamp is
   needed to indicate the duration of the silence period. The global
   timestamp is sufficient to reconstruct the timing in the face of
   losses. Therefore, in this case, only a global timestamp is required.

   It is desirable to support a variety of different frame sizes within
   such an aggregated connection, however. The way to do this in this
   case is to simply mandate that different packets can contain differ-
   ent frame sizes; the only restriction is within a packet. This is not
   as simple as it may seem at first. Once this is done, the relation-
   ship between sequence numbers and timing is lost. Consider an exam-
   ple. There are two frame sizes, 10ms and 30ms. Packet N contains 10ms
   frames, as does packet N+1 and N+2, however, N+3 contains 30ms
   frames. Thus, although the difference in sequence number between the
   first and fourth is three, the relative timing is not 10ms*3 or
   30ms*3. Due to this fact, the measurement of jitter is complicated
   (for the same reasons described in Section 4.3.1), as it should not
   be done between two packets with different frame sizes. It also makes
   recovery techniques based on sequence number more complex. To resolve
   this problem, we use a natural concept in RTP, which is the synchro-
   nization source (SSRC). The approach is to have a separate SSRC for
   each frame size in use. Then, sequence numbers are interpreted for
   each SSRC separately. This resolves the problem with the relationship
   between timing and sequence numbering. It also makes jitter and delay
   computations simpler - they are now done for each SSRC separately.
   Furthermore, multiple jitter (and delay, loss, etc.)  values are
   reported to the source, one for each frame size. This is also desir-
   able, since the different frame sizes will cause different packetiza-
   tion delays and packet sizes, which may cause those packets to see
   different delays and losses in the network than other packets.

   This case has both advantages and drawbacks when compared to the gen-
   eral case. As an advantage, timing is greatly simplified, and the
   approach falls much in line with the original intentions of RTP. How-
   ever, it causes losses in efficiency for systems with a variety of
   different frame sizes in operation simultaneously. Such a situation
   arises naturally when flow control is applied to each source individ-
   ually, as opposed to altering the rate and codec type for all of the
   active sources.

4.3 Channel ID

   The question of channel identification may seem at first trivial -
   simply use a 32 bit number, much like the SSRC, and be done with it.
   However, 32 bits adds significant overhead. Reduction of the number
   of bits for the channel ID becomes a complex issue. Unlike the single


J.Rosenberg, H.Schulzrinne                                    [Page 9]


Internet Draft               RTP Mux Issues              October 1, 1998


   user case, the connection may remain active for long periods of time
   (days or months). The result is that channel IDs will need to be
   reused during the lifetime of the connection. It is critical to
   ensure that data from different channels is not confused because of
   this. Large channel ID spacing helps to resolve this issue (although
   it can not eliminate it), so an added side effect of reducing the
   number of channel IDs possible is an increase in the likelihood of
   such confusion.

   The first question to be addressed is how many simultaneous users can
   one expect to find in a single packet.

4.3.1 Number of Users

   There are several ways to come up with some minimums and maximums.

4.3.1.1 Delay-bound

   Clearly, as we add more users, the store and forward delays increase
   since the packet size gets larger. Therefore, if we bound the per-hop
   delay, and provide a lower bound on the codec bitrate and packetiza-
   tion delay, an upper bound on the number of users can be obtained.
   Consider a 2.4 kbps codec, with a 20ms frame size. This is a reason-
   able minimum combination. Next, consider 50ms store and forward
   delays. For a T1, this limits the number of users within a packet to
   965. For a T3, it is 30 times this, or nearly 29,000. If silence sup-
   pression is used, the number of users within a packet is roughly half
   the number of active users (on average), thus requiring twice as many
   channel identifiers (1930 and 58,000). This bound doesn't seem to
   tight. Intuitively, even 965 seems too large.

4.3.1.2 Efficiency bound

   The entire purpose of multiplexing is to improve upon efficiency.
   Therefore, we should be able to support at least as many users as is
   necessary to get good efficiency. Consider a typical case, a 16 kbps
   codec, with a 20ms packetization delay. This results in 320 bits of
   data per user. If we assume IP/UDP/RTP (20+8+12=40 bytes = 320 bits),
   plus an additional word (32 bits) of overhead per user, the effi-
   ciency vs. N becomes:

   E = (320N / ((320 + 32)N + 320))

   This reaches an asymptote of 90 percent of this, say 88 packet, so
   that we must support at least 14 active channels (again, due to stat
   mux). The lower bound, therefore, on the number of users is around
   14.



J.Rosenberg, H.Schulzrinne                                   [Page 10]


Internet Draft               RTP Mux Issues              October 1, 1998


4.3.1.3 MTU Bound

   In many cases, there is a maximum packet size. This is usually around
   1500 bytes. If we consider a very low bitrate codec, the minimum
   block size from any particular user is 32 bits (otherwise, overheads
   become very large, and we lose word alignment, so 32 bits is a good
   minimum). Dividing 1500 bytes by 4 bytes, we obtain a maximum of 375
   users. Multiplying by two, the number of active channels needed is
   around 750.

   Based on these bounds, we need to simultaneously support at least 10
   users, and at most 750. This would imply that at least 8 to 10 bits
   of channel ID are required.

4.3.2 Channel ID Reuse Problem

   It is important to guarantee that data from a particular channel is
   never routed to a different channel; this would mean that a user may
   hear pieces of conversations from different users, an error we con-
   sider catastrophic. Such misrouting becomes possible when a channel
   is torn down, and a new channel is set up soon after using the same
   channel ID. Such a scenario is depicted in Figure 4. Sometime after
   channel K is torn down, a new channel is set up using the same chan-
   nel ID, K. If the data packets (dotted lines) are being delayed sig-
   nificantly, blocks from the old channel K may still be present in the
   data stream after the new channel K is established. These blocks will
   then be played out to the new user of channel K. Protocol support is
   needed to guarantee that this can never happen.




   The solution lies in an intelligent signaling protocol. The protocol
   must support a two-way handshake for all control messages. In addi-
   tion, three simple rules must be obeyed at a source when setting up
   or tearing down connections:

     1.   When a source sends a teardown message, it stops sending data
          in the UDP stream for that channel. Furthermore, in the sig-
          naling message, it indicates the sequence number of the packet
          which contained the last block for that channel, call this
          sequence number K.

     2.   A source cannot re-use a channel identifier until it has
          received an acknowledge from the destination that that partic-
          ular channel was successfully torn down.



J.Rosenberg, H.Schulzrinne                                   [Page 11]


Internet Draft               RTP Mux Issues              October 1, 1998




                         |                            |
                      t1 |------------- teardown K    |
                         |.            --------------X|
                         |  .old K data               |t2
                         |    .                -------|
                         |  ACK TD K  ---------       |
                      t3 |X-----------                |
                         |        .                   |
                         |          .                 |
                         |------------- setup K       |
                         |            .--------------X|
                         |              .......       |t4
                         |   ACK SET K  --------------|
                         |X-------------       ....   |
                         |......                   ..X|
                         |      ........data new K    |
                         |              .............X|

       Figure 4: Channel ID Reuse Problem


     3.   A source cannot send begin to send data from a particular
          channel in the UDP stream until it has received an acknowledge
          from the destination that the setup is complete.

A few simple rules must also be used at the receiver:

     1.   When a receiver gets a teardown message, it checks the highest
          SN received so far (call this sequence number M). If M > K,
          the channel is torn down, and any further blocks containing
          that channel ID are discarded. If M < K, blocks from that
          channel are accepted until the received SN exceeds K. Once
          this happens, the channel is torn down and no further blocks
          with that channel ID are accepted.

     2.   When a setup message is received, the destination will begin
          to accept blocks with the given channel identifier, but only
          if the sequence numbers of the packets in which they ride is
          greater than K.

The use of the sequence numbers allows the receiver to separate the old
channel K blocks from the new ones. This guarantees that the destination
will not misroute packets. An additional benefit is that the end of
speech will not be clipped if the last data packets arrive after the
teardown is received. This protocol is quite simple to implement,
although it requires a table at the receiver of the values of K for each
channel ID.

Alternate solutions to this reuse problem exist which can operate when


J.Rosenberg, H.Schulzrinne                                   [Page 12]


Internet Draft               RTP Mux Issues              October 1, 1998


the above restrictions are relaxed. The simplest approach is to have the
source keep a linked list of free channel IDs. The list is initialized
to contain all channel IDs, in order. When a new channel is required to
be established, the channel ID is taken from the top of the list. When a
channel is torn down, its ID is placed at the bottom of the list. This
makes the time between channel ID reuse as long as possible, and reduces
the probability of confusion. With this method, it is no longer neces-
sary to include sequence numbers in the tear down messages. Also, the
receiver does not need to maintain a table.

4.3.3 Channel ID Coding

   This section discusses some of the options for coding the channel ID
   field.

4.3.3.1 Fixed Length

   The fixed length approach is the most straightforward. A fixed number
   of bits is assigned to the channel ID. Issues surrounding the number
   of bits required have been discussed above.

4.3.3.2 Implicit + Present Mask

   In reality, the channel IDs are very redundant. Both source and des-
   tination know the set of active connections and their channel identi-
   fiers from the signalling messages. Therefore, if the blocks are
   placed in the packet in order of increasing channel ID, very little
   information actually needs to be sent. In fact, without silence sup-
   pression, channel activity and the presence of a block in a packet
   are likely to be equivalent, in which case NO information actually
   needs to be sent about channel IDs.

   Unfortunately, there are some practical problems with this. First,
   silence suppression is used. Secondly, even if it weren't, it is pos-
   sible for the voice codecs at the ITG not to have their framing syn-
   chronized (as in the general case above), so that a packet may not
   contain data from all users. Thirdly, the source and destination do
   NOT have a consistent view of the state of the system. There is a
   delay while signaling messages are in transit.

   A few simple mechanisms can be used to overcome these complexities.
   In the header of the packet, a mask is sent. Each bit in the mask
   indicates whether data from a channel is present in the packet or
   not. Mapping of channel ID's to bits is done by sorting the channel
   ID's, and mapping the lowest number to the first bit, next lowest to
   the second, etc. Therefore, if a channel has no data for that packet,
   its bit is set to zero. Given that the source and destination agree
   on how many connections are active at all points in time, the number


J.Rosenberg, H.Schulzrinne                                   [Page 13]


Internet Draft               RTP Mux Issues              October 1, 1998


   of bits required is known to both sides.

   The next step is to deal with the differences in state. An additional
   field, called the state-number, perhaps 5 bits, is sent in the header
   of the packet. This field starts at zero. Lets say at some point in
   time, its value is N. The source wishes to tear down a channel. It
   sends the tear down message to the destination, but continues to send
   data for that channel (or it may choose to send nothing, but must set
   the appropriate bit in the mask to zero). When the destination
   receives the message, it replies with an acknowledge. When the
   acknowledge is received by the source, the source considers the chan-
   nel torn down, and no longer sends data for it, nor considers it in
   computing the mask. In the packet where this happens, the source also
   increments the state-number field to N+1. The destination knows that
   the source will do this, and will therefore consider the state
   changed for all packets whose value of the field is N+1 or greater.
   When the next signaling message takes effect, the field is further
   increased. Even if packets are lost, the value of the state-number
   field for any correctly received packet completely tells the destina-
   tion the state of the system as seen in that packet. Furthermore, it
   is not necessary to wait for a particular setup or teardown to be
   acknowledged before requesting another setup or teardown.

   The number of bits for the state-number field should be set large
   enough to represent the maximum number of state changes which can
   have taken effect during a round trip time. As an alternative, an
   additional exchange can occur. After the destination receives a
   packet with state number greater than N, it destroys the state
   related to N, and sends back, reliably, a free-state N message, indi-
   cating to the destination that state N is now de-allocated, and can
   be used again. Until such a message is received, the source cannot
   reuse state N. This is essentially a window based flow control, where
   the flow is equal to changes in state. With this addition, the number
   of bits for the state number can be safely reduced, and it is guaran-
   teed that the destination will never confuse the state, independent
   of the number of state- number bits used. However, the use of too few
   state bits can cause call blocking or delay the teardown of inactive
   channels.

   This problem in state difference appears to be similar to the channel
   ID reuse problem described in Section 4.4.2. However, there is an
   important difference. In the channel ID reuse problem, if the packet
   containing the last block of a user arrives before the signaling mes-
   sage tearing down that connection, there is no problem. The destina-
   tion will generally play out silence until the signaling message is
   received. Here, however, the destination must know that blocks are no
   longer present in the data stream independent of when the signaling
   messages arrive.


J.Rosenberg, H.Schulzrinne                                   [Page 14]


Internet Draft               RTP Mux Issues              October 1, 1998


   There are some drawbacks to this approach. They require the source
   and destination to maintain state. Any error in processing at either
   end, or a hardware failure, causes a complete loss of synchroniza-
   tion. This hard-state nature of the protocol can be relaxed by having
   the source send the complete state of the system with each signaling
   message, along with the state-number field for which this state takes
   effect. This guarantees that even in the event of end- system fail-
   ure, the system state will be refreshed whenever a new connection is
   set up or torn down. Furthermore, the state can be sent periodically
   to improve performance.

4.4 Length Indicators

   There are many ways to actually code the length indicators. The first
   question, however, is the range of lengths which must be coded.

4.4.1 Range of Length Indicators

   Here, there is a clear tradeoff between flexibility and efficiency. A
   larger range can accommodate a variety of different media (such as
   video) where lengths may be large. However, this comes at the expense
   of a long length field, which may require another word of header to
   hold. For voice, one would expect a maximum bitrate to be 64 kbps,
   and around 50ms packetization delay. This yields exactly 100 words of
   data. Therefore, an eight bit field is probably sufficient for most
   voice applications.

4.4.2 PTI Based Lengths

   In many applications, the amount of data present depends on the voice
   codec in use. Frame based coders will generally send a frame at a
   time. Since the codec type is indicated by the PTI field, it may not
   always be necessary to send length information at all. Even for non-
   frame based codecs, such as PCM, default data sizes can be set in the
   standard (as in RFC 1890 [6]). An extension bit can be used to indi-
   cate a non-standard length, so that when set, a length field follows.
   This allows for efficient coding of the most common cases, but allows
   for variable lengths with little additional cost.

4.4.3 Variable Length w/ Indicator

   In this approach, a variable length header is used. All of the length
   indicators for all of the blocks are placed together in the beginning
   of the packet. However, the first four bits of this header field
   indicate the number of bits used for each length field. What follows
   are the length fields themselves, each using the number of bits indi-
   cated by the first four bits. This approach scales well, using a
   small overhead when the block lengths are small, and a larger


J.Rosenberg, H.Schulzrinne                                   [Page 15]


Internet Draft               RTP Mux Issues              October 1, 1998


   overhead when they are larger. The drawback is a variable length
   header field, plus additional complexity in the parsing. An example
   of this technique is depicted in Figure 5. In the first example, the
   four bit indicator field has a value of three, so that the length
   fields are all three bits long. The four lengths are then 2,6,3, and
   8. In the second example, the 4 bit indicator has a value of two, so
   that the length fields are all two bits long. The four lengths are
   thus 3,2,1, and 3.



                                   4b   3b 3b  3b  3b
                                  --------------------
                    Example 1   |0011|010|110|011|100|
                                  --------------------

                                     4b  2b 2b 2b 2b
                                    ----------------
                      Example 2   |0010|11|10|01|11|
                                    ----------------




   Figure 5: Variable Length w/ Indicator

4.4.4 Remaining Packet Length Based Lengths

   UDP always informs RTP of how many bytes are in the payload. This
   itself restricts the possible length of the first block, since its
   length must be less than the total packet length minus the RTP
   header. Furthermore, as each block is placed into the packet, the
   possible set of lengths that it can have shrinks - it must always be
   less than the remaining length in the packet. This approach, there-
   fore, codes each length field with log2 of the number of bits remain-
   ing in the packet. This approach works extremely well when there is a
   long packet followed by several shorter ones, whereas the previous
   approach performs poorly in this case. Furthermore, it eliminates the
   length indicator present in the previous approach. However, it is
   even more complex than the previous technique. It can result in no
   savings under some conditions, especially since the header fields
   must be rounded to 32 bits.

   Consider an example. The total size of the packet is 31 words. Inside
   of it are three blocks, the first whose length is 17, the second 8,
   and the third, 6. We would code the length field with 5 bits. After
   this block is read, the remaining amount of data in the packet is 14
   words. Therefore, the next length field is coded with 4 bits. After


J.Rosenberg, H.Schulzrinne                                   [Page 16]


Internet Draft               RTP Mux Issues              October 1, 1998


   this block, the remaining amount of data in the packet is 6 words, so
   the final length field is coded with three bits. The total is there-
   fore 5+4+3 = 12 bits. In the previous approach (Section 4.5.3), the
   entire length field would have required 4 bits for the indicator
   (whose value would be 5), followed by 3 five bit fields, for a total
   of 19 bits.

   One may question this example since the overhead of the length fields
   itself is not taken into account when computing the remaining length
   of the packet. While this can be incorporated, it makes things even
   more complex, and it is not actually necessary. All that is required
   is that the length fields are coded with log2(M), where M is any
   bound on the remaining amount of data which can be deterministically
   computed from past information. A simple bound is the packet length
   minus the data seen thus far (one can also subtract away any fixed
   length fields), precisely the metric used in the example above.

4.4.5 Table Based Approach

   Realistically, most systems will operate with codecs that generate
   data in a fixed set of lengths (a frame size, for example). In that
   case, the set of lengths which can appear in the packet are usually
   very restricted. To take advantage of this fact, a table can be
   transmitted to the receiver reliably before transmission commences.
   This table can indicate the actual length of a block, and its coding.
   The symbols transmitted in the data packets are then used in this
   table to look up the actual lengths. This can reduce the length field
   to 2 or 3 bits. These lengths then all occur next to each other in
   the header. The technique now relies on state at the receiver, and
   the parsing process is further complicated by table lookups. In addi-
   tion, the approach only works if you know the set of lengths before
   the system begins operation. If you allow the table to be dynamically
   modified during a session, synchronization problems occur, and the
   system becomes quite complex.

   Further gains can be achieved through the use of Huffman codes
   instead of fixed length codes This only makes sense when different
   codecs (and correspondingly different lengths) are used with differ-
   ent frequencies. An example of such a situation is when the codec
   changes to a higher rate because of music-on-hold; a rare event in
   general.

4.5 Marker Bit

   The marker bit has a general functionality, but is normally used to
   indicate the beginning of a talkspurt. It seems like a good idea to
   include this bit for each user.



J.Rosenberg, H.Schulzrinne                                   [Page 17]


Internet Draft               RTP Mux Issues              October 1, 1998


4.6 Location of Per User Overhead

   There will generally be overhead on a per-user basis (information
   such as channel ID, length, etc.). This information can be located in
   one of three places. First, it can all reside in front of the block
   to which it is applicable. Second, it can all be pasted together and
   reside up front in the header of the packet. The third is a hybrid
   solution, where some of it resides up front (such as channel ID), and
   some resides in front of the data. There are various pros and cons to
   the different approaches. The hybrid approach can be complex, since
   data is split into multiple places. The case where all the header is
   up front has a few minor advantages. First, it allows for a complete
   separation of the data from the header. The implementation is likely
   to be a little less complex, since extracting blocks does not require
   actually moving through the payload.

5 Options

5.1 Option I: Mixer Based

   This option is the most straightforward to implement, but has the
   most overhead. The basic premise is to reuse the mixer concept intro-
   duced in RTP. Each user is considered a contributing source, and the
   gateway is considered a mixer. However, instead of mixing the media,
   separate data from each user appear in the payload. The 32 bit CSRC
   identifies each user, acting as the channel ID. Data from each user
   is organized into blocks. Each block has its own 32 bit header, which
   includes the length (12 bits) in units of 32 bit words, Marker bit
   (1b), TimeStamp Offset (12b), and Payload Type (7b). Furthermore, the
   payload type and marker bit are stricken from the RTP header (since
   they only make sense for an individual user), and the CC field
   expanded to fill the missing bytes. This allows for a 12 bit CC
   field, or 4096 users in a packet. Thus, the packet would look like:



   Figure 6: Option I

   This approach allows for the most amount of generality in terms of
   variable length coders and coders with different frame sizes (see
   Section 4.3.1). The channel ID is longer than necessary, but using
   the concept of a contributing source for the channel ID necessitates
   the use of the additional bits. There are several variations on
   option I, many of which have been mentioned above:

   I.A: Put the CSRC with each 32 bit length+M+PT field, instead of all
   of them being at the beginning. This has some pros and cons. As an
   interesting artifact of this change, it is no longer necessary to


J.Rosenberg, H.Schulzrinne                                   [Page 18]


Internet Draft               RTP Mux Issues              October 1, 1998




         0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           contributing  source (SSRC) identifier  1           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           contributing  source (SSRC) identifier  2           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                              ..........
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           contributing  source (SSRC) identifier  N           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           Length      |      Timestamp Offset |M|             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 1                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           Length      |      Timestamp Offset |M|             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 2                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



   have a CC field. The length passed up by UDP is sufficient to recover
   the point at where you stop checking for additional blocks from users
   in the payload. In fact, the length field in the last block is not
   strictly necessary either.

   I.B: Do the opposite of I.A. Put the length+M+PT field up front along
   with the CSRC fields, with the pattern being CSRC 1, length 1, CSRC
   2, length 2, etc. Here again, the CC field is not strictly necessary.

   I.C: The CSRC field can be shrunk to 8 bits. This allows for either 4
   or two channel IDs to be coded in the space of one word, whereas only
   one could in the current size of the field.

   I.D: The CSRC field can be shrunk to 16 bits.

5.2 Option II: One word header


J.Rosenberg, H.Schulzrinne                                   [Page 19]


Internet Draft               RTP Mux Issues              October 1, 1998


   This option eliminates the large channel ID field present in the pre-
   vious option. In the RTP header, the CC bit is set to zero, the
   marker bit has no meaning, and the payload type is TBD (possible uses
   include an indication of the number of blocks in the packet). The RTP
   timestamp corresponds to the generation of the first sample, among
   all blocks, enclosed in this packet. A one word header precedes each
   block of data. The number of blocks is known by parsing them until
   the end of the RTP packet. The one word field has a channel ID (8
   bits), length (8 bits), Marker (1 bit), timestamp offset (11 bits),
   and payload type (4 bits). Channel ID number 255 is reserved, and
   causes the header to be expanded to allow for greater length, payload
   type, and possibly channel ID encodings. The specific format for this
   expanded header is for further study. Given the compacted payload
   type space, it may be a good idea to allow negotiation of the meaning
   for the payload type at the beginning of the connection. It may be
   worthwhile to expand the length field at the expense of the channel
   ID - this issue is for further study.

   The format of the packet is thus:



         0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    Length     |    Timestamp Offset |    CID        |M|  PTI  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 1                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    Length     |    Timestamp Offset |    CID        |M|  PTI  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 2                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+





   Figure 7: Option II



J.Rosenberg, H.Schulzrinne                                   [Page 20]


Internet Draft               RTP Mux Issues              October 1, 1998


5.3 Option III - Restricted Case

   Option II has the advantage of being able to support multiple frame
   sizes within a single packet. However, it comes at the expense of a
   32 bit header (which can be large for low bitrate codecs), and at a
   reduced payload type field. This option has a 16 bit header, but does
   not support different frame sizes within a packet. It therefore falls
   into the category described in Section 4.3.2. Of the 16 bit header,
   the first bit is an expand bit (to be described shortly), and the
   second bit is the marker bit. The following 6 bits indicate payload
   type, and the remaining 8 are for channel ID. When the expand bit is
   set, an additional 16 bits are present, which indicate the length of
   the block. When expand is clear, the length is derived from the pay-
   load type. Since there is no timestamp offset, all the blocks in the
   packet must be time aligned and have the same frame lengths. Differ-
   ent sized frames are supported by using a different SSRC for each
   frame length (see Section 4.3.2). In the RTP header, the CC field is
   always zero. The marker bits and payload type are undefined. The
   timestamp indicates the time of generation of the first sample of
   each block. SSRC is randomly chosen, but always different for each
   frame size.

   The block headers are all located at the beginning of the packet, and
   follow each other. If the total length of the fields is not a multi-
   ple of 32 bits, it is padded out to 32. The structure of the header
   is such that fields never break across packet boundaries. An example
   of such a packet is given in Figure 8. There are 7 blocks in this
   example. The first two have standard lengths based on the PT field.
   The next one uses the expansion bit to indicate the length. The
   fourth uses the PT field, the fifth the expansion bit, and the last
   two use the PT field. The last 16 bits of the header are padded out.



   Figure 8: Option III

   5.4 Option IV - Stacked RTP

   This approach uses a duplicate of the RTP header as the per-block
   header. It is therefore extremely inefficient (12 bytes per block),
   but has several advantages: different media types can be mixed, since
   the timestamps are no longer related, and little processing is
   required if the sources being combined came from a single user RTP
   source. It also works well when one of the users is actually a mixer
   (for example, a conference bridge), since the CSRC can be used. Its
   main advantage is the reduction in overhead due to the IP and UDP
   headers. In addition to the standard RTP header, an additional header
   is required for length indication. This header has a number of 16 bit


J.Rosenberg, H.Schulzrinne                                   [Page 21]


Internet Draft               RTP Mux Issues              October 1, 1998




         0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |E|M|     PT    |      ID       |E|M|    PT     |      ID       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |E|M|     PT    |      ID       |            Length             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |E|M|     PT    |      ID       |E|M|    PT     |      ID       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Length                |E|M|    PT     |      ID       |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |E|M|     PT    |      ID       |              PAD              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 1                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 2                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 3                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 4                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 5                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 6                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 7                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



   fields, each of which indicates a length for its corresponding block
   (including the 12 byte RTP header). The number of such 16 bit lengths
   fields is known by continuing to look for additional length fields


J.Rosenberg, H.Schulzrinne                                   [Page 22]


Internet Draft               RTP Mux Issues              October 1, 1998


   until the total length of the packet passed up from UDP has been
   accounted for. If an odd number of such length fields is required,
   then an additional 16 bits of padding is inserted to make the length
   header a multiple of 32 bits.

   The format of such a packet is given in Figure 9.



         0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           Length 1            |         Length 2              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           Length 3            |            PAD                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 1                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Payload 2                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+





   Figure 9: Option IV

5.4 Option V: Compacted

   This option uses the Implicit + Mask approach outlined in Section
   4.4.3.2 to code the channel ID. In all other respects it is similar
   to Option III. Now, however, the per-block header can be reduced to
   one byte: 1 bit of expansion, 1 bit of marker, and 6 bits of payload
   type. Furthermore, the length field (present when the expansion bit


J.Rosenberg, H.Schulzrinne                                   [Page 23]


Internet Draft               RTP Mux Issues              October 1, 1998


   is set) is reduced to 8 bits from 16 in Option III. This reduction
   saves on space, but it also guarantees that fields remain aligned on
   byte boundaries. The mask bits are present in the beginning of the
   packet, and they are preceded by a 8 bit state-number. If the number
   of active channels is not a multiple of 32, the mask field is padded
   out to a full word. This approach is extremely efficient, but the
   channel identification procedure is more complex and requires addi-
   tional signaling support.

   A diagram of a typical packet for this option is given in Figure 10.
   The marker bits are indicated with lowercase ms. There are four
   active channels, each of which is present in this packet (all four
   mask bits would then be 1). The first block has a standard length,
   but the second has its expansion bit set, so that an 8 bit length
   field follows. The remaining two blocks have normal 8 bit headers.
   The last 24 bits of the header are padded to a word boundary.



   Figure 10: Option V

6 Comparison of Options

   In this section, the options are compared in terms of efficiency.
   Issues relating to complexity, scalability, and generality have
   already been discussed in previous sections. The analysis here con-
   sists of a series of tables, indicating the efficiency of each option
   for a variety of speech codecs. Several tables are included for dif-
   ferent numbers of users.

   6.1 Specific Codecs

   In both Table 1 and Table 2, the efficiency vs. codec for all three
   options is tabulated. For G.711, G.726, G.728 and G.722, the frame
   size listed is a multiple of the actual frame size of the codec,
   which is too small to be sent one at a time. The efficiency is com-
   puted as the number of words of payload such a codec would occupy,
   times the number of users, divided by the total packet size (i.e., it
   does not consider inefficiencies due to padding the payload portion).
   Note that Option V is always superior in efficiency. The efficiencies
   are generally 1 to 10 percent apart. Table 1 considers the case where
   there are 10 users, and Table 2 considers the case where there are
   24.


   Codec|rate|Frame(ms)|   I  |I.C   |I.D   |  II  | III  |  IV  | V
   G.711| 64 |   20    |93.02 |94.56 |94.12 |95.24 |96.39 |90.50 |96.84
   G.726| 32 |   20    |86.96 |89.69 |88.89 |90.91 |93.02 |82.64 |93.88


J.Rosenberg, H.Schulzrinne                                   [Page 24]


Internet Draft               RTP Mux Issues              October 1, 1998




         0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           synchronization source (SSRC) identifier            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  State Num    |m|m|m|m|               Pad                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |E|M|    PT     |E|M|      PT   |E|M|     PT    |E|M|   PT      |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |E|M|    PT     |                   PAD                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Block 1                                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Block 2                                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                        Block 3                                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      |                        Block 4                                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



   G.728| 16 |  18.75  |76.92 |81.30 |80.00 |83.33 |86.96 |70.42 |88.47
   G.729|  8 |   10    |50.00 |56.60 |54.55 |60.00 |66.67 |41.67 |69.72
   G.723|5.3 |   30    |62.50 |68.49 |66.67 |71.43 |76.92 |54.35 |79.33
   G.723|6.3 |   30    |66.67 |72.29 |70.59 |75.00 |80.00 |58.82 |82.16
   ITU  | 4  |   20    |50.00 |56.60 |54.55 |60.00 |66.67 |41.67 |69.72
   G.722| 64 |   15    |90.91 |92.88 |92.31 |93.75 |95.24 |87.72 |95.84
   GSM F| 13 |   20    |75.00 |79.65 |78.26 |81.82 |85.71 |68.18 |87.35
   IS54 |7.95|   20    |62.50 |68.49 |66.67 |71.43 |76.92 |54.35 |79.33
   IS96 |8.5 |   20    |66.67 |72.29 |70.59 |75.00 |80.00 |58.82 |82.16

                          Table 1: 10 Users




   Codec|rate|Frame(ms)|   I  |I.C   |I.D   |  II  | III  |  IV  | V


J.Rosenberg, H.Schulzrinne                                   [Page 25]


Internet Draft               RTP Mux Issues              October 1, 1998


   G.711| 64 |   20    |94.30 |96.00 |95.43 |96.58 |97.76 |91.34 |98.26 |
   G.726| 32 |   20    |89.22 |92.31 |91.25 |93.39 |95.62 |84.06 |96.57 |
   G.728| 16 |  18.75  |80.54 |85.71 |83.92 |87.59 |91.60 |72.51 |93.37 |
   G.729|  8 |   10    |55.38 |64.29 |61.02 |67.92 |76.60 |44.17 |80.87 |
   G.723| 5.3|   30    |67.42 |75.00 |72.29 |77.92 |84.51 |56.87 |87.57 |
   G.723| 6.3|   30    |71.29 |78.26 |75.79 |80.90 |86.75 |61.28 |89.42 |
   ITU  | 4  |   20    |55.38 |64.29 |61.02 |67.92 |76.60 |44.17 |80.87 |
   G.722| 64 |   15    |92.54 |94.74 |93.99 |95.49 |97.04 |88.78 |97.69 |
   GSM F| 13 |   20    |78.83 |84.38 |82.44 |86.40 |90.76 |70.36 |92.69 |
   IS54 |7.95|   20    |67.42 |75.00 |72.29 |77.92 |84.51 |56.87 |87.57 |
   IS96 |8.5 |   20    |71.29 |78.26 |75.79 |80.90 |86.75 |61.28 |89.42 |

                         Table 2: 24 Users



7 Authors' Addresses

   Jonathan Rosenberg
   Rm. 4C-526
   Bell Laboratories, Lucent Technologies
   101 Crawfords Corner Rd.
   Holmdel, NJ 07733
   electronic mail:  jdrosen@bell-labs.com

   Henning Schulzrinne
   Dept. of Computer Science
   Columbia University
   1214 Amsterdam Avenue
   New York, NY 10027
   USA
   electronic mail:  schulzrinne@cs.columbia.edu

8 Bibliography

   [1] M. Handley and V. Jacobson, SDP: session description protocol,
   Request for Comments (Proposed Standard) 2327, Internet Engineering
   Task Force, Apr.  1998.

   [2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: a
   transport protocol for real-time applications, Request for Comments
   (Proposed Standard) 1889, Internet Engineering Task Force, Jan. 1996.

   [3] B. Subbiah and S. Sengodan, User multiplexing in rtp payload
   between ip telephony gateways, (internet draft), Internet Engineering
   Task Force, Aug.  1998.  Work in Progress.

   [4] J. Rosenberg and H. Schulzrinne, An RTP payload format for user


J.Rosenberg, H.Schulzrinne                                   [Page 26]


Internet Draft               RTP Mux Issues              October 1, 1998


   multiplexing, Internet Draft, Internet Engineering Task Force, May
   1998.  Work in progress.

   [5] K. Tanigawa, T. Hoshi, and K. Tsukada, An rtp simple multiplexing
   transfer method for internet telephony gateway, (internet draft),
   Internet Engineering Task Force, June 1998.  Work in Progress.

   [6] H. Schulzrinne, RTP profile for audio and video conferences with
   minimal control, Request for Comments (Proposed Standard) 1890,
   Internet Engineering Task Force, Jan. 1996.








































J.Rosenberg, H.Schulzrinne                                   [Page 27]