MMUSIC Working Group                                        M. Willekens
Internet-Draft                                  Devoteam Telecom & Media
Intended status: Informational                          M. Garcia-Martin
Expires: January 13, 2009                                       Ericsson
                                                                   P. Xu
                                                     Huawei Technologies
                                                           July 12, 2008


Multiple Packetization Times in the Session Description Protocol (SDP):
               Problem Statement, Requirements & Solution
           draft-garcia-mmusic-multiple-ptimes-problem-03.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on January 13, 2009.

Abstract

   This document provides a problem statement and requirements with
   respect to the presence of a single packetization time (ptime/
   maxptime) attribute in SDP media descriptions that contain several
   media formats (audio codecs).  Furthermore, a best common practice
   solution for the use of 'ptime/maxptime' is proposed based on
   'static', 'dynamic' and 'indicated' values.  Some methods already
   proposed as ad-hoc solutions and background information is included
   in an appendix.



Willekens, et al.       Expires January 13, 2009                [Page 1]


Internet-Draft            Multiple ptime in SDP                July 2008


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Problem Statement  . . . . . . . . . . . . . . . . . . . . . .  4
   3.  Requirements . . . . . . . . . . . . . . . . . . . . . . . . .  5
   4.  BCP solution proposal  . . . . . . . . . . . . . . . . . . . .  6
     4.1.  Sending party RTP voice payload  . . . . . . . . . . . . .  6
       4.1.1.  ptime(s) - Static  . . . . . . . . . . . . . . . . . .  7
       4.1.2.  ptime(d) - Dynamic . . . . . . . . . . . . . . . . . .  7
       4.1.3.  ptime(i) - Indicated . . . . . . . . . . . . . . . . .  7
       4.1.4.  ptime/maxptime algorithm . . . . . . . . . . . . . . .  8
       4.1.5.  Algorithm and examples . . . . . . . . . . . . . . . .  9
         4.1.5.1.  Codec independent parameters . . . . . . . . . . .  9
         4.1.5.2.  Codec dependent parameters . . . . . . . . . . . . 10
         4.1.5.3.  Pseudocode algorithm . . . . . . . . . . . . . . . 10
         4.1.5.4.  Pseudocode examples  . . . . . . . . . . . . . . . 10
     4.2.  Receiving party RTP voice payload  . . . . . . . . . . . . 11
     4.3.  Procedures for the SDP offer/answer  . . . . . . . . . . . 11
       4.3.1.  Procedures for an SDP offerer  . . . . . . . . . . . . 11
       4.3.2.  Procedures for an SDP answerer . . . . . . . . . . . . 12
     4.4.  Advantages . . . . . . . . . . . . . . . . . . . . . . . . 12
   5.  Conclusion and next steps  . . . . . . . . . . . . . . . . . . 12
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 13
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
     8.1.  Normative References . . . . . . . . . . . . . . . . . . . 13
     8.2.  Informative References . . . . . . . . . . . . . . . . . . 13
   Appendix A.  Related RFCs for ptime  . . . . . . . . . . . . . . . 15
   Appendix B.  Ad-hoc solutions for multiple ptime . . . . . . . . . 17
     B.1.  Method 1 . . . . . . . . . . . . . . . . . . . . . . . . . 18
     B.2.  Method 2 . . . . . . . . . . . . . . . . . . . . . . . . . 18
     B.3.  Method 3 . . . . . . . . . . . . . . . . . . . . . . . . . 19
     B.4.  Method 4 . . . . . . . . . . . . . . . . . . . . . . . . . 19
     B.5.  Method 5 . . . . . . . . . . . . . . . . . . . . . . . . . 20
     B.6.  Method 6 . . . . . . . . . . . . . . . . . . . . . . . . . 20
     B.7.  Method 7 . . . . . . . . . . . . . . . . . . . . . . . . . 20
     B.8.  Method 8 . . . . . . . . . . . . . . . . . . . . . . . . . 21
     B.9.  Method 9 . . . . . . . . . . . . . . . . . . . . . . . . . 21
     B.10. Method 10  . . . . . . . . . . . . . . . . . . . . . . . . 22
     B.11. Method 11  . . . . . . . . . . . . . . . . . . . . . . . . 22
   Appendix C.  Background info . . . . . . . . . . . . . . . . . . . 22
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26
   Intellectual Property and Copyright Statements . . . . . . . . . . 28








Willekens, et al.       Expires January 13, 2009                [Page 2]


Internet-Draft            Multiple ptime in SDP                July 2008


1.  Introduction

   "Session Description Protocol" (SDP) [RFC4566] provides a protocol to
   describe multimedia sessions for the purposes of session
   announcement, session invitation, and other forms of multimedia
   session initiation.  A session description in SDP includes the
   session name and purpose, the media comprising the session,
   information needed to receive the media (addresses, ports, formats,
   etc.) and some other information.

   In the SDP media description part, the m-line contains the media type
   (e.g. audio), a transport port, a transport protocol (e.g.  RTP/AVP)
   and a media format description which depends on the transport
   protocol.

   For the transport protocol RTP/AVP or RTP/SAVP, the media format sub-
   field can contain a list of RTP payload type numbers.  See "RTP
   Profile for Audio and Video Conferences with Minimal Control"
   [RFC3551], Table 4.
   For example: "m=audio 49232 RTP/AVP 3 15 18" indicates the audio
   encoders GSM, G728, and G729.

   Further, the media description part can contain additional attribute
   lines that complement or modify the media description line.  Of
   interest for this memo, are the 'ptime' and 'maxptime' attributes.
   According to [RFC4566], the 'ptime' attribute gives the length of
   time in milliseconds represented by the media in a packet, and the
   'maxptime' gives the maximum amount of media that can be encapsulated
   in each packet, expressed as time in milliseconds.  These attributes
   modify the whole media description line, which can contain an
   extensive list of payload types.  In other words, these attributes
   are not specific to a given codec.

   [RFC4566] also indicates that it should not be necessary to know
   'ptime' to decode RTP or vat audio since the 'ptime' attribute is
   intended as a recommendation for the encoding/packetization of audio.
   However, once more, the existing 'ptime' attribute defines the
   desired packetization time for all the payload types defined in the
   corresponding media description line.

   End-devices can sometimes be configured with different codecs and for
   each codec a different packetization time can be indicated.  However,
   there is no clear way to exchange this type of information between
   different user agents and this can result in lower voice quality,
   network problems or performance problems in the end-devices.






Willekens, et al.       Expires January 13, 2009                [Page 3]


Internet-Draft            Multiple ptime in SDP                July 2008


2.  Problem Statement

   The packetization time is an important parameter which helps in
   reducing the packet overhead.  Many voice codecs define a certain
   frame length used to determine the coded voice filter parameters and
   try to find a certain trade-off between the perceived voice quality,
   measured by the Mean Option Score (MOS), and the required bitrate.
   When a packet oriented network is used for the transfer, the packet
   header induces an additional overhead.  As such, it makes sense to
   combine different voice frame data in one packet, up to a Maximum
   Transmission Unit (MTU), to find a good balance between the required
   network resources, end-device resources and the perceived voice
   quality influenced by packet loss, packet delay, jitter.  When the
   packet size decreases, the bandwidth efficiency is reduced.  When the
   packet size increases, the packetization delay can have a negative
   impact on the perceived voice quality.

   The "RTP Profile for Audio and Video Conferences with Minimal
   Control" [RFC3551], Table 1, indicates the frame size and default
   packetization time for different codecs.  The G728 codec has a frame
   size of 2.5 ms/frame and a default packetization time of 20 ms/
   packet.  For G729 codec, the frame size is 10 ms/frame and a default
   packetization time of 20 ms/packet.

   When more and more audio streaming traffic is carried over IP-
   networks, the quality as perceived by the end-user should be no worse
   as the classical telephony services.  For VoIP service providers, it
   is very important that endpoints receive audio with the best possible
   codec and packetization time.  In particular, the packetization time
   depends on the selected codec for the audio communication and other
   factors, such as the Maximum Transmission Unit (MTU) of the network
   and the type of access network technology.

   As such, the packetization time is clearly a function of the codec
   and the network access technology.  During the establishment of a new
   session or a modification of an existing session, an endpoint should
   be able to express its preferences with respect to the packetization
   time for each codec.  This would mean that the creator of the SDP
   prefers the remote endpoint to use certain packetization time when
   sending media with that codec.

   SDP [RFC4566] provides the means for expressing a packetization time
   that affects all the payload types declared in the media description
   line.  So, there are no means to indicate the desired packetization
   time on a per payload type basis.  Implementations have been using
   proprietary mechanisms for indicating the packetization time per
   payload type, leading to interoperability problems.




Willekens, et al.       Expires January 13, 2009                [Page 4]


Internet-Draft            Multiple ptime in SDP                July 2008


   One of these mechanisms is the 'maxmptime' attribute, defined in
   [ITU.V152], which indicates the supported packetization period for
   all codec payload types.

   Another one is the 'mptime' attribute, defined by "PacketCable"
   [PKT.PKT-SP-EC-MGCP], which indicates a list of packetization period
   values the endpoint is capable of using (sending and receiving) for
   this connection.

   While all have similar semantics, there is obviously no
   interoperability between them, creating a nightmare for the
   implementer who happens to be defining a common SDP stack for
   different applications.

   A few RTP payload format descriptions, such as:
   [RFC3267], [RFC3016], and [RFC3952], indicate that the packetization
   time for such payload should be indicated in the 'ptime' attribute in
   SDP.  However, since the 'ptime' attribute affects all payload
   formats included in the media description line, it would not be
   possible to create a media description line that contains all the
   mentioned payload formats and different packetization times.  The
   solutions range from considering a single packetization time for all
   payload types, or creating a media description line that contains a
   single payload type.

   However, once more, if several payload formats are offered in the
   same media description line in SDP, there is no way to indicate
   different packetization times per payload format.


3.  Requirements

   The main requirement is coming from the implementation and media
   gateway community making use of hardware based solutions, e.g.  DSP
   or FPGA implementations with silicon constraints for the amount of
   buffer space.

   Some are making use of the ptime/codec information to make certain
   QoS budget calculations.  When the packetization time is known for a
   codec with a certain frame size and frame data rate, the efficiency
   of the throughput can be calculated.

   Currently, the 'ptime' and 'maxptime' are "indication" attributes and
   optional.  When these parameters are used for resource reservation
   and for hardware initializations, a negotiated value between the SDP
   offerer and SDP answerer can become a requirement.

   There could be different sources for the 'ptime/maxptime', i.e. from



Willekens, et al.       Expires January 13, 2009                [Page 5]


Internet-Draft            Multiple ptime in SDP                July 2008


   RTP/AVP profile, from end-user device configuration, from network
   architecture, from receiver.

   The codec and 'ptime/maxptime' in upstream and downstream can be
   different.


4.  BCP solution proposal

   The basic idea of this proposal is to keep the packetization time
   independent from the codec and to consider the main purpose of the
   'ptime' as follows.

   The 'ptime' is a parameter indicating the packetization time which is
   an important parameter for the end-to-end delay of the voice signal
   as indicated in the previous sections.  It is defined as a media-
   attribute in the SDP.

   The only requirement for the use of the 'ptime' or 'maxptime' is the
   total size of the message which should fit in the MTU and the
   packetization time should be an integer multiple of the codec frame
   size.

   If the same session does require different kind of streams, e.g. in a
   conference where some users have a narrowband connection and others
   having a broadband connection, different media can be defined and
   allocated to different ports.  In that case, different m-lines can be
   defined and another 'ptime' and 'maxptime' can be indicated.

   The IETF RFCs are not clear when the 'ptime' or 'maxptime' in the SDP
   are not an integer multiple of the frame size.  What should be used
   in that case?  Making use of the default 'ptime', making use of the
   'ptime' which is an integer multiple of the frame size and lower than
   the indicated 'ptime'?  In case of an indicated 'maxptime', taking a
   value as close as possible to the indicated 'ptime' but lower as the
   'maxptime'?

   This proposal takes care about the IETF architectural principle of
   "be strict when sending" and "be tolerant when receiving".  Ref.
   [RFC1958].

4.1.  Sending party RTP voice payload

   The transmitting side of a connection needs to know the packetization
   time it can use for the RTP payload data, i.e. how many speech frames
   it can include in the RTP packet.  A trade-off between the
   packetization delay and the transmission efficiency has to be made
   and this can be a static or a dynamic process which involves all



Willekens, et al.       Expires January 13, 2009                [Page 6]


Internet-Draft            Multiple ptime in SDP                July 2008


   elements in the end-to-end chain.

   As such, 3 different sources to determine the packetization time are
   considered.

4.1.1.  ptime(s) - Static

   Static provided values in the end-device: default values or manually
   defined values.

   An end-device implementation must know:
   1.  all the codec specific parameters such as:
       1.  Sampling rate (e.g. 8000 Hz).
       2.  Amount of channels (e.g. 1).
       3.  Frame size in ms (e.g. 20 ms).
       4.  Amount of encoded bits per frame (e.g. 264 bits).
       5.  Amount of required octets per frame (e.g.  G.723.1 with 6.4
           kbps, has 189 bits for the encoded data resulting in a
           datarate of 189/30 ms or 6.3 kbps.  However, the packet data
           is octet aligned and as such, 3 bits are added which results
           in 24 octets/frame or a datarate of 6.4 kbps).
   2.  system specific parameters such as:
       1.  MTU supported by the network and by the protocol stack of the
           end-device.
       2.  Packetization time (e.g. 60 ms) and the maximum packetization
           time (e.g. 150 ms).
       3.  Supported codecs.

4.1.2.  ptime(d) - Dynamic

   Dynamic provided values defined by the network architecture.

   The network can indicate, as part of the device management, its
   supported codecs, the 'ptime' and 'maxptime'.  These values can also
   change based on the dynamic behavior of the network.  During heavy
   load on the network, the network architecture can decide to use lower
   rate codecs (for bandwidth issues) and/or higher packetization times
   (for packet processing performance).  This dynamic change can be done
   before, during or after a session.

4.1.3.  ptime(i) - Indicated

   Proposed indicated values coming from the receiving side.

   The receiving side can indicate in the SDP the 'ptime' and 'maxptime'
   value it wants to receive.  This is an optional parameter for the
   media, codec independent and considered as an indication only.  It
   should only be considered as a hint to the sending party.



Willekens, et al.       Expires January 13, 2009                [Page 7]


Internet-Draft            Multiple ptime in SDP                July 2008


4.1.4.  ptime/maxptime algorithm

   Instead of indicating a 'ptime/maxptime' on a per-codec basis as done
   in many different proposals, this draft proposes to make use of the
   'ptime/maxptime' as a common parameter coming from different sources:
   ptime(s), ptime(d), ptime(i) and maxptime(s), maxptime(d),
   maxptime(i).

   In function of the available information for the 'ptime' and
   'maxptime', the packetization time which will be used for the
   transmission "pt" is based on following algorithm.
   1.  Determine codec to be used, e.g.  G723 based on local info or the
       optional network info.
   2.  Determine coding data rate, e.g. 6.4 kbps based on local info or
       the optional network info.
   3.  Based on the codec, the frame size in ms is known: fc = frame
       size of the codec.
   4.  Determine the MTU size which can be used.  Based on this value,
       the codec frame size and datarate, a 'maxptime' related to the
       codec "mc" can be calculated.
   5.  Check the ptime(s, d, i) and maxptime(s, d, i, mc).  Take the
       maximum value from the available set of ptime(s, d, i) which is
       lower or equal than the minimum value in the set maxptime(s, d,
       i, mc).
   6.  Normalize this 'ptime' value to the integer multiple of the frame
       size lower or equal to this 'ptime' value and lower or equal to
       the "mc" but not lower then the codec frame size.

   Remark:
   It's up to a local policy of the device, to determine which 'ptime/
   maxptime' sources it will use in its calculation, e.g. it is possible
   to disallow the treatment of the 'ptime' indicated by the other side.
   This can easily be done by including/excluding the 'ptime/maxptime'
   values from the vectors used in the calculation.

   The formula to calculate the packetization time for the transmission
   of voice packets in the RTP payload data has following input
   parameters.

   1.  The packetization time made available from different sources.
       When no value is known, the frame size of the voice codec is
       used.
   2.  The maximum packetization time values made available from
       different sources.  When no value is known, the frame size of the
       voice codec is used.
   3.  The frame size of the codec.





Willekens, et al.       Expires January 13, 2009                [Page 8]


Internet-Draft            Multiple ptime in SDP                July 2008


   4.  The packetization time corresponding with the selected codec,
       frame size, frame datarate and the network MTU.  This
       packetization time has to be larger or equal to the frame size.
       At least one frame size should fit in the MTU!

   The function has one output parameter: the packetization time which
   has to be used for the transmission: "pt".  It is the frame size of
   the codec multiplied by the number of frames which have to be placed
   in the RTP payload based on the provided 'ptime' and 'maxptime'
   values.  In the formula, the maximum packetization time related to
   the MTU is added to the vector which contains one or more
   packetization time values.  The minimum value out of this set is
   determined.  For the 'ptime' set "p" which contains one or more
   values, the values of the 'ptime' which is higher as the minimum
   value of the 'maxptime' set "mp" is replaced by this value.  Then the
   maximum value out of this set is determined and used to calculate the
   amount of voice frames which can be included with that packetization
   time.

   Some examples are provided.  The first example is related to the G723
   with a frame size of 30 ms.  When the receiver has indicated a
   'ptime' of 20 ms in the SDP, the RTP will be sent with one voice
   frame of 30 ms.
   In another example, a G711 codec with a default 'ptime' of 20 ms and
   an indicated 'ptime' of 60 ms, 3 speech frames of 20 ms can be
   transmitted in one RTP packet towards the receiver which has
   indicated his ability to receive RTP packets with 60 ms packetization
   time.

   This "pt" is used to allocate the PCM buffer size where the voice
   samples from the synchronous network interface are stored before
   being passed in RTP packets towards the packet oriented network.

   When the 'ptime' and 'maxptime' are lower as the frame size of the
   codec, no packetization time for the transmission can be determined.
   An invalid value (=0) is indicated by the algorithm.  In that case,
   the sender has to select another codec with a voice frame size which
   is lower or equal to the 'ptime' or 'maxptime'.

4.1.5.  Algorithm and examples

4.1.5.1.  Codec independent parameters

   o  p = vector containing all provided packetization time values such
      as static, dynamic, indicated values.
   o  mp = vector containing all provided maximum packetization time
      values.




Willekens, et al.       Expires January 13, 2009                [Page 9]


Internet-Draft            Multiple ptime in SDP                July 2008


   At least, one "p" and "mp" value have to be provided.  When no
   static, dynamic or indicated values are known, the frame size of the
   codec "fc" can be used.

4.1.5.2.  Codec dependent parameters

   o  fc = frame size of the codec
   o  mc = max packetization time which corresponds with the selected
      codec, frame size, frame datarate and the network MTU (mc > fc).

4.1.5.3.  Pseudocode algorithm


    pt(p,mp,fc,mc) := |mp <- stack(mp,mc)
                      |if cols(p)>0
                      |  for i e 0..cols(p)-1
                      |     p(i)<-min(mp) if p(i)>min(mp)
                      |otherwise
                      |     p<-min(mp) if p>min(mp)
                      |nf<-1 if (nf<-floor(max(p)/fc)<=0) & (min(mp)>fc)
                      |fc.nf


                           Pseudocode algorithm

4.1.5.4.  Pseudocode examples


   ptime:=20         maxptime:=60          pt(ptime,maxptime,30,100)=30
   ptime:=20         maxptime:=20          pt(ptime,maxptime,30,100)=0
   ptime:=30         maxptime:=30          pt(ptime,maxptime,30,100)=30
   ptime:=60         maxptime:=80          pt(ptime,maxptime,30,100)=60

   ptime:=20         maxptime:=60          pt(ptime,maxptime,20,100)=20
   ptime:=60         maxptime:=80          pt(ptime,maxptime,20,100)=60
   ptime:=70         maxptime:=200         pt(ptime,maxptime,20,100)=60
   ptime:=120        maxptime:=60          pt(ptime,maxptime,20,100)=60

   ptime:=120        maxptime:=200         pt(ptime,maxptime,10,100)=100
   ptime:=[40,50,20] maxptime:=200         pt(ptime,maxptime,10,100)=50
   ptime:=[40,50,20] maxptime:=[40,50,20]  pt(ptime,maxptime,10,100)=20
   ptime:=[120,40] maxptime:=[150,200,100] pt(ptime,maxptime,10,100)=100


                            Pseudocode examples






Willekens, et al.       Expires January 13, 2009               [Page 10]


Internet-Draft            Multiple ptime in SDP                July 2008


4.2.  Receiving party RTP voice payload

   The receiver has to make use of the information in the RTP to
   determine the codec type, the frame rate and the total packetization
   time of the voice payload data.

   For the receiver, two parts in the data flow can be considered.
   First, the packet has to be received from the packet oriented
   network.  At the other side, mostly a synchronous network is provided
   where PCM voice samples are used.

   This proposal describes a method how the receiver can handle unknown
   packetization buffer requirements which also allows inband changes
   for the codec datarate and packetization time.

   As indicated, there are different sources for the 'maxptime' and it
   is already described how a 'maxptime' value can be determined for
   sending it in the SDP indication.  The same 'maxptime' is used for
   the allocation of the PCM buffer space where the voice samples
   received in the RTP packets are stored before being transmitted
   towards the synchronous network, after a de-jittering.  An indication
   is given to the DSP hardware about the actual packetization length
   obtained from the received RTP packet.  When the amount of samples
   are stored in the buffer corresponding to the packetization length,
   an interrupt is generated and the data is transmitted without having
   to wait for another RTP packet to fill-up the remaining space.

4.3.  Procedures for the SDP offer/answer

   This section contains the procedures related to the calculation of
   the 'ptime' and 'maxptime' attributes when they are used by protocols
   following the SDP offer/answer model specified in [RFC3264].

4.3.1.  Procedures for an SDP offerer

   An SDP offerer may include a 'ptime' value and a 'maxptime' value in
   the SDP.  These values are merely an indication of the desired
   packetization times.  The same formula as for the "pt" is used to
   determine the 'ptime' in the SDP.  When the media line contains
   different codec formats, the 'ptime' value is determined for the
   first codec in the format list (i.e. the codec with the highest
   priority).  For the 'maxptime', the minimum value of the 'maxptime'
   value set is used in the SDP and normalized to an integer multiple of
   the frame size of the first codec in the list.

   It's up to a local policy of the device, to determine which 'ptime/
   maxptime' sources it will use in its calculation, e.g. it is possible
   to disallow the treatment of a certain 'ptime'.  This can easily be



Willekens, et al.       Expires January 13, 2009               [Page 11]


Internet-Draft            Multiple ptime in SDP                July 2008


   done by including/excluding the 'ptime/maxptime' values from the
   vectors used in the calculation.

4.3.2.  Procedures for an SDP answerer

   An SDP answerer that receives an SDP offer may also determine the
   value of 'ptime' value and the 'maxptime' value to be included in the
   SDP answer.  These parameters are determined in the same way as done
   by the offerer.  However, the "answerer" can use another local policy
   to determine which 'time/maxptime' sources will be used in the
   calculation.

4.4.  Advantages

   The new proposed method has following advantages:
   1.  Basic idea of the 'ptime' related RFCs is kept.  No new
       parameters have to be added and no new interpretations or
       semantic reordering has to be done.
   2.  The new method is strict in sending and tolerant in receiving.
       It sends with the maximum allowed 'ptime' lower or equal to the
       minimal 'maxptime'.
   3.  Different sources for the 'ptime' and 'maxptime' are taken into
       account, even more as done in the different current proposals
       trying to negotiate end-to-end.
   4.  A local policy in the end-device can easily be adopted and
       adapted without requiring changes in the end-to-end protocol.
   5.  The algorithm makes use of all the provided information about
       'ptime', 'maxptime', codec frame size, MTU size and proposes the
       most optimum 'ptime'.
   6.  The same algorithm is used at sending and receiving side, for SDP
       indications and RTP packets.
   7.  The algorithm is small and straight-forward.  Codec dependent and
       codec independent parameters are clearly indicated.


5.  Conclusion and next steps

   This memo advocates for the need of a standardized mechanism to
   indicate the packetization time on a per codec basis, allowing the
   creator of SDP to include several payload formats in the same media
   description line with different packetization times.

   This memo encourage discussion in the MMUSIC WG mailing list in the
   IETF.  The ultimate goal is to define a standard mechanism that
   fulfils the requirements highlighted in this memo.

   The goal is finding a solution which does not require changes in
   implementations which have followed the existing RFC guidelines and



Willekens, et al.       Expires January 13, 2009               [Page 12]


Internet-Draft            Multiple ptime in SDP                July 2008


   which are able to receive any packetization time.


6.  Security Considerations

   This memo discusses a problem statement and requirements.  As such,
   no protocol that can suffer attacks is defined.


7.  IANA Considerations

   This document does not request IANA to take any action.


8.  References

8.1.  Normative References

   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
              Description Protocol", RFC 4566, July 2006.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
              with Session Description Protocol (SDP)", RFC 3264,
              June 2002.

8.2.  Informative References

   [ITU.V152]
              ITU-T, "Procedures for supporting voice-band data over IP
              networks", ITU-T Recommendation V.152, January 2005.

   [ITU.G114]
              ITU-T, "One-way transmission time", ITU-T
              Recommendation G.114, May 2005.

   [PKT.PKT-SP-EC-MGCP]
              PacketCable, "PacketCable Network-Based Call Signaling
              Protocol Specification", PacketCable PKT-SP-EC-MGCP-I11-
              050812, August 2005.

   [PKT.PKT-SP-CODEC-MEDIA]
              PacketCable, "Codec and Media Specification",
              PacketCable PKT-SP-CODEC-MEDIA-I02-061013, October 2006.

   [I-D.ietf-mmusic-sdp-capability-negotiation]
              Andreasen, F., "SDP Capability Negotiation",
              draft-ietf-mmusic-sdp-capability-negotiation-08 (work in
              progress), December 2007.



Willekens, et al.       Expires January 13, 2009               [Page 13]


Internet-Draft            Multiple ptime in SDP                July 2008


   [RFC3890]  Westerlund, M., "A Transport Independent Bandwidth
              Modifier for the Session Description Protocol (SDP)",
              RFC 3890, September 2004.

   [RFC3108]  Kumar, R. and M. Mostafa, "Conventions for the use of the
              Session Description Protocol (SDP) for ATM Bearer
              Connections", RFC 3108, May 2001.

   [RFC4504]  Sinnreich, H., Lass, S., and C. Stredicke, "SIP Telephony
              Device Requirements and Configuration", RFC 4504,
              May 2006.

   [RFC3441]  Kumar, R., "Asynchronous Transfer Mode (ATM) Package for
              the Media Gateway Control Protocol (MGCP)", RFC 3441,
              January 2003.

   [RFC3952]  Duric, A. and S. Andersen, "Real-time Transport Protocol
              (RTP) Payload Format for internet Low Bit Rate Codec
              (iLBC) Speech", RFC 3952, December 2004.

   [RFC4060]  Xie, Q. and D. Pearce, "RTP Payload Formats for European
              Telecommunications Standards Institute (ETSI) European
              Standard ES 202 050, ES 202 211, and ES 202 212
              Distributed Speech Recognition Encoding", RFC 4060,
              May 2005.

   [RFC1958]  Carpenter, B., "Architectural Principles of the Internet",
              RFC 1958, June 1996.

   [RFC2327]  Handley, M. and V. Jacobson, "SDP: Session Description
              Protocol", RFC 2327, April 1998.

   [RFC3267]  Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie,
              "Real-Time Transport Protocol (RTP) Payload Format and
              File Storage Format for the Adaptive Multi-Rate (AMR) and
              Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs",
              RFC 3267, June 2002.

   [RFC3016]  Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y., and H.
              Kimata, "RTP Payload Format for MPEG-4 Audio/Visual
              Streams", RFC 3016, November 2000.

   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
              Video Conferences with Minimal Control", STD 65, RFC 3551,
              July 2003.






Willekens, et al.       Expires January 13, 2009               [Page 14]


Internet-Draft            Multiple ptime in SDP                July 2008


Appendix A.  Related RFCs for ptime

   Many RFCs make references to the 'ptime/maxptime' attribute to give
   some definitions, recommendations, requirements, default values.

   [RFC4566] defines the 'ptime' and 'maxptime' as:

      a=ptime:[packet time]
      "This gives the length of time in milliseconds represented by the
      media in a packet.  This is probably only meaningful for audio
      data, but may be used with other media types if it makes sense.
      It should not be necessary to know ptime to decode RTP or vat
      audio, and it is intended as a recommendation for the encoding/
      packetization of audio.  It is a media-level attribute, and it is
      not dependent on charset."

      a=maxptime:[maximum packet time]
      "This gives the maximum amount of media that can be encapsulated
      in each packet, expressed as time in milliseconds.  The time SHALL
      be calculated as the sum of the time the media present in the
      packet represents.  For frame-based codecs, the time SHOULD be an
      integer multiple of the frame size.  This attribute is probably
      only meaningful for audio data, but may be used with other media
      types if it makes sense.  It is a media-level attribute, and it is
      not dependent on charset."

      "Additional encoding parameters MAY be defined in the future, but
      codec-specific parameters SHOULD NOT be added.  Parameters added
      to an "a=rtpmap:" attribute SHOULD only be those required for a
      session directory to make the choice of appropriate media to
      participate in a session.  Codec-specific parameters should be
      added in other attributes (for example, "a=fmtp:")."

      "Note: RTP audio formats typically do not include information
      about the number of samples per packet.  If a non-default (as
      defined in the RTP Audio/Video Profile) packetization is required,
      the 'ptime' attribute is used as given above."

   Remark:
   'maxptime' was introduced after the release of [RFC2327], and non-
   updated implementations will ignore this attribute.

   "SDP Offer/answer model" [RFC3264].
   Describe requirements for the 'ptime' for the SDP offerer and SDP
   answerer.
   If the 'ptime' attribute is present for a stream, it indicates the
   desired packetization interval that the offerer would like to
   receive.  The 'ptime' attribute MUST be greater than zero.



Willekens, et al.       Expires January 13, 2009               [Page 15]


Internet-Draft            Multiple ptime in SDP                July 2008


   The answerer MAY include a non-zero 'ptime' attribute for any media
   stream.  This indicates the packetization interval that the answerer
   would like to receive.  There is no requirement for the packetization
   interval to be the same in each direction for a particular stream.

   "SDP Transport independent bandwidth modifier" [RFC3890].
   Indicates the 'ptime' as a possible candidate for the bandwidth but
   it should be avoided for that purpose.  The use of another parameter
   is indicated as a proposed method.

   "SDP Conversions for ATM bearer" [RFC3108].
   It is not recommended to use the 'ptime' in ATM applications since
   packet period information is provided with other parameters (e.g. the
   profile type and number in the 'm' line, and the 'vsel', 'dsel' and
   'fsel' attributes).  Also, for AAL1 applications, 'ptime' is not
   applicable and should be flagged as an error.  If used in AAL2 and
   AAL5 applications, 'ptime' should be consistent with the rest of the
   SDP description.
   The 'vsel', 'dsel' and 'fsel' attributes refer generically to codecs.
   These can be used for service-specific codec negotiation and
   assignment in non-ATM as well as for ATM applications.
   The 'vsel' attribute indicates a prioritized list of one or more 3-
   tuples for voice service.  Each 3-tuple indicates a codec, an
   optional packet length and an optional packetization period.  This
   complements the 'm' line information and should be consistent with
   it.
   The 'vsel' attribute refers to all directions of a connection.  For a
   bidirectional connection, these are the forward and backward
   directions.  For a unidirectional connection, this can be either the
   backward or forward direction.
   The 'vsel' attribute is not meant to be used with bidirectional
   connections that have asymmetric codec configurations described in a
   single SDP descriptor.  For these, the 'onewaySel' attribute should
   be used.
   The 'vsel' line is structured with an encodingName, a packetLength
   and a packetTime.
   The packetLength is a decimal integer representation of the packet
   length in octets.  The packetTime is a decimal integer representation
   of the packetization interval in microseconds.  The parameters
   packetLength and packetTime can be set to "-" when not needed.  Also,
   the entire 'vsel' media attribute line can be omitted when not
   needed.

   "SIP device requirements and configuration" [RFC4504].
   In some cases, certain network architectures have constraints
   influencing the end devices.  The desired subset of codecs supported
   by the device SHOULD be configurable along with the order of
   preference.  Service providers SHOULD have the possibility of



Willekens, et al.       Expires January 13, 2009               [Page 16]


Internet-Draft            Multiple ptime in SDP                July 2008


   plugging in own preferred codecs.  The codec settings MAY include the
   packet length and other parameters like silence suppression or
   comfort noise generation.  The set of available codecs will be used
   in the codec negotiation according to [RFC3264].
   Example: Codecs="speex/8000;ptime=20;cng=on,gsm;ptime=30"

   "MGCP ATM package" [RFC3441].
   Packet time changed ("ptime(#)"):
   If armed via an R:atm/ptime, a media gateway signals a packetization
   period change through an O:atm/ptime.  The decimal number, in
   parentheses, is optional.  It is the new packetization period in
   milliseconds.  In AAL2 applications, the pftrans event can be used to
   cover packetization period changes (and codec changes).
   Voice codec selection (vsel): This is a prioritized list of one or
   more 3-tuples describing voice service.  Each vsel 3-tuple indicates
   a codec, an optional packet length and an optional packetization
   period.

   "RTP payload for iLBC" [RFC3952].
   The 'maxptime' SHOULD be a multiple of the frame size.  This
   attribute is probably only meaningful for audio data, but may be used
   with other media types if it makes sense.  It is a media attribute,
   and is not dependent on charset.  Note that this attribute was
   introduced after [RFC2327], and non updated implementations will
   ignore this attribute.
   Parameter 'ptime' can not be used for the purpose of specifying iLBC
   operating mode, due to fact that for the certain values it will be
   impossible to distinguish which mode is about to be used (e.g., when
   'ptime=60', it would be impossible to distinguish if packet is
   carrying 2 frames of 30 ms or 3 frames of 20 ms, etc.).

   "RTP payload for distributed speech recognition" [RFC4060].
   If 'maxptime' is not present, 'maxptime' is assumed to be 80ms.
   Note, since the performance of most speech recognizers are extremely
   sensitive to consecutive FP losses, if the user of the payload format
   expects a high packet loss ratio for the session, it MAY consider to
   explicitly choose a 'maxptime' value for the session that is shorter
   than the default value.


Appendix B.  Ad-hoc solutions for multiple ptime

   During last years, different solutions were already proposed and
   implemented with the goal to make the 'ptime' in function of the
   codec instead of the media, containing a list of codecs.  The list of
   given solutions indicates what kind of logical proposals were already
   made to find a solution for the SDP interworking issues due to
   implementation and RFC interpretations without imposing any



Willekens, et al.       Expires January 13, 2009               [Page 17]


Internet-Draft            Multiple ptime in SDP                July 2008


   preference for a certain solution.

   In all these proposals, a semantic grouping of the codec specific
   information is made by giving a new interpretation of the sequence of
   the parameters or by providing new additional attributes.

   REMARK:
   All these methods are against the basic rule indicated in the RFCs
   which state that a 'ptime' and 'maxptime' are media specific and NOT
   codec specific.  It does not solve the interworking issues!  Instead,
   it makes it worse due to many new interpretations and implementations
   as indicated by following examples.

   To avoid a further divergence, the implementation community is
   strongly asking for a standardized solution.

B.1.  Method 1

   Write the rtpmap first, followed by the 'ptime' when it is related to
   the codec indicated by that rtpmap.

   This method tries to correlate a ptime to a specific codec but many
   existing implementations will suffer from such a proposal.  Some SDP
   encoder implementations first write the media line, followed by the
   rtpmap lines and then the other value attributes such as ptime and
   fmtp.  So, it is difficult to know to which payload type the 'ptime'
   is related.  In following example, it's hard to tell if ptime:20 is
   related to payload 0 or 4 or both and the interpretation of this
   information by the remote end is unknown.  Implementations which are
   fully compliant with the existing RFCs will suffer from such new
   proposals.


                          m=audio 1234 RTP/AVP 4 0
                          a=rtpmap:4 G723/8000
                          a=rtpmap:0 PCMU/8000
                          a=ptime:20
                          a=fmtp:4 bitrate=6400

                                 Method 1

B.2.  Method 2

   Grouping of all codec specific information together.

   Most implementers are in favor of this proposal, i.e. writing the
   value attributes associated with an rtpmap listed immediately after
   it.  But, this is also a new interpretation.  Normally, the ptime



Willekens, et al.       Expires January 13, 2009               [Page 18]


Internet-Draft            Multiple ptime in SDP                July 2008


   refers to all payload types indicated in the m-line.  All existing
   implementations will also suffer from such a method.


                          m=audio 1234 RTP/AVP 4 0
                          a=rtpmap:4 G723/8000
                          a=fmtp:4 bitrate=6400
                          a=rtpmap:0 PCMU/8000
                          a=ptime:20

                                 Method 2

B.3.  Method 3

   Use the 'ptime' for every codec after its rtpmap definition.  This
   makes the 'ptime' a required parameter for each payload type.  It
   looks obvious but not allowed according the existing RFCs.  And will
   the same construct be used for the 'maxptime'?


                        m=audio 1234 RTP/AVP 0 18 4

                        a=rtpmap:18 G729/8000
                        a=ptime:30

                        a=rtpmap:0 PCMU/8000
                        a=ptime:40

                        a=rtpmap:4 G723/8000
                        a=ptime:60

                                 Method 3

B.4.  Method 4

   Create a new 'mptime' (multiple ptime) attribute that contains
   different packetization times, each one mapped to its corresponding
   payload type in the preceding 'm=' line.  What will happen when the
   other side sends a RTP stream with a different packetization time?
   Should the elements in the mptime attribute be interpreted as
   required values or preferred values?  With this approach, the RFC
   compliant implementations are also affected and have to consider to
   the new mptime attribute.


                        m=audio 1234 RTP/AVP 0 18 4
                        a=mptime 40 30 60




Willekens, et al.       Expires January 13, 2009               [Page 19]


Internet-Draft            Multiple ptime in SDP                July 2008


                                 Method 4

B.5.  Method 5

   Use of a new 'x-ptime' attribute.  However, SDP parsers complained
   about x- headers.  It was once indicated to better use something
   without x- (e.g. 'xptime').  This is just another type of encoding of
   method 4 and also doesn't solve anything.


                          m=audio 1234 RTP/AVP 0 8
                          a=x-ptime 20 30

                                 Method 5

B.6.  Method 6

   Use of different m-lines with one codec per m-line.
   However this is a misuse because different m-lines means different
   audio streams and not different codec options.  So, this is certainly
   against the existing SDP concept.


                          m=audio 1234 RTP/AVP 0
                          a=rtpmap:0 PCMU/8000
                          a=ptime:40

                          m=audio 1234 RTP/AVP 18
                          a=rtpmap:18 G729/8000
                          a=ptime:30

                          m=audio 1234 RTP/AVP 4
                          a=rtpmap:4 G723/8000
                          a=ptime:60

                                 Method 6

B.7.  Method 7

   Use of the 'ptime' in the 'fmtp' attribute


                  m=audio 1234 RTP/AVP 4 18
                  a=rtpmap:18 G729/8000
                  a=fmtp:18 annexb=yes;ptime=20
                  a=maxptime:40

                  a=rtpmap 4 G723/8000



Willekens, et al.       Expires January 13, 2009               [Page 20]


Internet-Draft            Multiple ptime in SDP                July 2008


                  a=fmtp:4 bitrate=6.3;annexa=yes;ptime=30
                  a=maxptime:60

                                 Method 7

B.8.  Method 8

   Use of the vsel parameter as done for ATM bearer connections
   Following example indicates first preference of G.729 or G.729a (both
   are interoperable) as the voice encoding scheme.  A packet length of
   10 octets and a packetization interval of 10 ms are associated with
   this codec.  G726-32 is the second preference stated in this line,
   with an associated packet length of 40 octets and a packetization
   interval of 10 ms.  If the packet length and packetization interval
   are intended to be omitted, then this media attribute line contains
   '-'.


                   a=vsel:G729 10 10000 G726-32 40 10000
                   a=vsel:G729 - - G726-32 - -

                                 Method 8

B.9.  Method 9

   Use of [ITU.V152]'maxmptime' (maximum multiple ptime) attribute,
   which contains different packetization times, each one maps to its
   corresponding payload type described in the preceding 'm=' line to
   indicate the supported packetization period for all codec payload
   types.  This attribute is a media-level attribute and defines a list
   of maximum packetization time values, expressed in milliseconds, the
   endpoint is capable of using (sending and receiving) for the
   connection.  When the maxmptime attribute is present, the ptime shall
   be ignored according to the V.152 specification.  When the maxptime
   is absent, then the value of ptime attribute, if present, shall be
   taken as indicating the packetization period for all codecs present
   in the 'm=' line.
   The specification doesn't specify what has to be done when a
   'maxptime' is also present.  Does the 'maxmptime' indicates the
   absolute maximum which can be used as packetization time for a
   certain codec or does it indicate the packetization time which has to
   be used as preference.  It's open to many different interpretations
   certainly in interworking scenarios.


                   m=audio 3456 RTP/AVP 18 0 13 96 98 99
                   a=maxmptime:10 10 - - 20 20




Willekens, et al.       Expires January 13, 2009               [Page 21]


Internet-Draft            Multiple ptime in SDP                July 2008


                                 Method 9

B.10.  Method 10

   Use of PacketCable 'mptime' attribute.  See "Codec and Media
   Specification" [PKT.PKT-SP-CODEC-MEDIA] which gives a Note about the
   'ptime': [RFC4566] defines the 'maxptime' SDP attribute and V.152
   defines the 'maxmptime' SDP attribute.  The precedence of these
   attributes with respect to the 'ptime' and 'mptime' attributes is not
   defined at this time."

   Remark:
   This method is the same as indicated by method 4.  However, in the
   [PKT.PKT-SP-CODEC-MEDIA] version from 9/2006, the mptime was removed
   and the maxptime was added.  The PacketCable seems to move away from
   the need of having multiple packetization times in function of the
   codec and treat it more in the direction of a maximum end-to-end
   delay aspect.

B.11.  Method 11

   Use of SDP capabilities negotiation method.  See
   [I-D.ietf-mmusic-sdp-capability-negotiation] which describes how
   additional capabilities can be negotiated, such as the different
   supported ptimes.  This could be a possible solution in certain
   cases, but it also requires updates in implementations which followed
   the basic ptime/maxptime concept to adapt themselves to more
   restricted implementations.  It also introduces additional complexity
   by adding new parameters and new semantics.


Appendix C.  Background info

   The "Session Initiation Protocol" (SIP) is used to setup media
   sessions.  In the SIP INVITE message, a "Session Description
   Protocol" (SDP) is used.  In the SDP media description part, the
   m-line contains the media type (e.g. audio), a transport port, a
   transport protocol (e.g.  RTP/AVP) and a media format description
   depending on the transport protocol.  For the transport protocol RTP/
   AVP or RTP/SAVP, the media format sub-field can contain a list of RTP
   payload type numbers.
   Example: m=audio 49232 RTP/AVP 8 0 4
   The "8 0 4" is the media format, indicating a list of possible codecs
   indicated by static or dynamic numbers as defined in RFC 3551
   [RFC3551].
   In the above example, a list of static numbers is used:
   8 = PCMA - G.711 PCM A-law
   0 = PCMU - G.711 PCM u-law



Willekens, et al.       Expires January 13, 2009               [Page 22]


Internet-Draft            Multiple ptime in SDP                July 2008


   4 = G723 - G.723.1

   The PCMA and PCMU are "sample-based" codecs while the G723 is a
   "frame-based" codec.  All of them make use of a sampling rate of 8
   kHz or 0.125 ms/sample.  PCMA and PMCU encode each sample in 8 bits
   by making use of the A or u logarithmic companding laws resulting in
   a datarate of 64 kbps.  G723 however does not operate on single
   samples, but on different samples combined together in a "frame".  As
   such, higher compression rates can be achieved.  The G723 codec makes
   use of 240 voice samples corresponding with 30 ms speech frame
   duration.  The codec compresses the data in the frame and encodes it
   with 192 or 160 bits resulting in a datarate of 6.4 or 5.3 kbps.
   G723 gives the advantage of a lower bit rate at the cost of increased
   voice delay: 30 ms instead of 0,125 ms

   The "International Telecommunication Union" (ITU) gives some
   guidelines on acceptable end-to-end delays in [ITU.G114].  A delay up
   to 150 ms is acceptable.  Between 150 and 400 ms, there is impact on
   the perceived voice quality but still acceptable.  Above 400 ms it
   becomes unacceptable.  Echo cancellers are required for delays >25
   ms.

   In "time division multiplexing" (TDM) networks, the coding delay is
   the biggest part contributing to the end-to-end delay.  However, in
   "Packet Oriented" networks, packetization delays are added to the
   end-to-end delay and can become an issue.  Each packet has a certain
   header which contributes to the bandwidth usage, i.e. the total
   required bit-rate.  The more data can be packed together, the smaller
   the influence of the header on the total payload and the higher the
   transmission efficiency.  However, combining more data in a packet
   gives an increase of the end-to-end delay.  As such, there is a
   trade-off between bandwidth usage, amount of packet processing and
   end-to-end delay.  For a higher compression rate, more data in a
   packet to improve the transmission efficiency gives a quality
   reduction due to the increased end-to-end delay.

   An example is indicated in following table where the G.711 (A or
   u-Law) is compared with the G.723.1 for different packetization
   delays.  The headers consist of:

   o  RTP header: 12 bytes.
   o  UDP header: 8 bytes.
   o  IPv4 header: 20 bytes.
   o  MAC layer: 14 bytes.
   o  CRC: 4 bytes.
   o  Start frame + preamble: 20 bytes.





Willekens, et al.       Expires January 13, 2009               [Page 23]


Internet-Draft            Multiple ptime in SDP                July 2008


     Codec  Packet Datarate Voice    Headers Tot    Payload Throughput
            Delay           Payload
            ms     kbps     bytes    bytes   bytes  %       kbps
     -----------------------------------------------------------------
     G711   0.125  64          1     78        79    1.3    5056.0
              2.5  64         20     78        98   20.4     313.6
              5    64         40     78       118   33.9     188.8
             10    64         80     78       158   50.6     126.4
             20    64        160     78       238   67.2      95.2
             30    64        240     78       318   75.5      84.8
             90    64        720     78       798   90.2      70.9
            200    64       1600     78      1678   95.4      67.1
     -----------------------------------------------------------------
     G723.1  30    6.4        24     78       102   23.5      27.2
             60    6.4        48     78       126   38.1      16.8
             90    6.4        72     78       150   48.0      13.3
            150    6.4       120     78       198   60.6      10.6
            300    6.4       240     78       318   75.5       8.5
     -----------------------------------------------------------------


                       Packet delay &amp; Throughput

   For the same packetization delay of 30 ms, the datarate of the
   G.723.1 is 10 times lower as for the G.711, but the payload
   efficiency is reduced from 75.5 to 23.5%.  The same efficiency for
   the G.723.1 is obtained when the packetization delay is 300 ms!
   While the packet efficiency is lower, the required bitrate on the
   link for the G.723.1 is reduced from 84.8 kbps to 27.2 kbps.  And
   when different frames are packed together, e.g. 3 frames of 30 ms,
   the packetization delay becomes 90 ms resulting in a lower amount of
   packets which have to be routed and processed and resulting in an
   improved throughput data rate of 13.3 kbps.

   The used frame sizes for the different codecs are 0.125 ms (G.711),
   2.5 ms (G728), 10 ms (G729); 20 ms (G726, GSM; GSM-EFR, QCELP, LPC)
   and 30 ms (G723).  All of them have a default 'ptime' of 20 ms, with
   the exception of the G723 with a default 'ptime' of 30ms.

   The media description part can contain additional attribute lines
   which complement or modify the media description line: 'ptime' and
   'maxptime' attributes.

   Example:
   m=audio 49232 RTP/AVP 8 0 4
   a=ptime:20
   a=maxptime:60




Willekens, et al.       Expires January 13, 2009               [Page 24]


Internet-Draft            Multiple ptime in SDP                July 2008


   RFC 35551 [RFC3551] defines the default packetization time for each
   codec in Table 1.  The PCMA and PCMU have 20 ms as default 'ptime'
   and the G723 has a 30 ms default 'ptime'.

   When, as in the example above, the 'ptime' value is 20, then it is a
   wrong value for the G723 codec which requires at least a frame size
   of 30 ms and as such requires a minimal packetization delay of 30 ms.
   And this causes many different interworking problems between
   different systems due to different interpretations of the relevant
   RFCs resulting in bad voice quality or call setup failures.

   In some APIs, the following functions are provided to interface with
   the RTP and codec hardware layer for encoding voice samples, based on
   a certain codec, in RTP packets.
   1.  Set the encoding parameters such as codec type, payload type (for
       RTP), packetization rate.  Mostly these parameters are
       configuration parameters of the device.  Either, these parameters
       are manually provided based on guidelines from the network
       architecture or are dynamically and automatically provided.
   2.  Next a transmit buffer has to be allocated.  The lower layer
       provides a function to calculate the required buffer size in
       function of the encoding parameters.
   3.  A transmit buffer is allocated with the indicated size (as a
       minimum) by the application layer.
   4.  The synchronous voice data which has to be encoded is passed to
       the hardware layer which encodes the data (codec and
       packetization) into the provided buffer.
   5.  The buffer with the RTP data is returned to the application which
       can sent it out on the host network interface towards the packet
       network.

   For the receiving part, required API functions are:
   1.  Set the required decoding parameters such as codec type, payload
       type, initial latency in frames, jitter buffer info.  Please note
       that packetization time is not required because every receiver
       should be able to handle up to 200 ms, which is in fact the MTU
       size for which the receiver should have the required resources.
   2.  The required buffer size which needs to be allocated is requested
       at the hardware.  This size is calculated based on the size of
       the RTP header and the maximum allowed payload of 200 ms.

   * The application however can decide to allocate smaller buffers if
   the worst case is known for the expected RTP packetization time, i.e.
   by making use of the 'maxptime' attribute.

   Most implementations make use of a general purpose host processor
   (GPP) in combination with a digital signal processor (DSP) for the
   codec/packetization part.  The host processor has the interface with



Willekens, et al.       Expires January 13, 2009               [Page 25]


Internet-Draft            Multiple ptime in SDP                July 2008


   the packet oriented world while the DSP has an interface with a real-
   time synchronous network mostly with special buffer handling
   mechanism to avoid too many interrupt handling.

   Suppose a VoIP call making use of the G711 A or u-law.  Most hardware
   solutions are using a DSP to handle the realtime stuff.  Most of
   these DSPs have special build-in hardware functionality for PCM
   samples.  The DSP can be configured for A or u law and for a specific
   clock rate.  For every transmitted or received PCM sample, the
   hardware can generate an interrupt.  But this has of course is a big
   burden on the system performance.  As such, the DSPs also provide a
   method to avoid this interrupt burden by providing a mechanism based
   on an internal buffer.  An interrupt is only generated when the
   buffer is empty or full.  The initialization of this DSP hardware for
   a specific call is done at the SIP invite SDP negotiation time.


                         m=audio 1234 RTP/AVP 0 8 4
                         ptime=30

                                  Example

   So, if this SDP contains a PT=0,8,4 (i.e.  G711u, G711A, G723) and a
   'ptime' of 30, then this 'ptime' can be used to initialize the DSP
   port with a buffer size for 30 ms PCM voice samples.  When the
   "offerer" sends a RTP packet for a G711u or G711A by making use of
   the default value of 20 ms, then the DSP PCM port is waiting for 30ms
   before sending out the buffer.  Because only 20 ms are received in
   the RTP packet, it has to wait for the next RTP packet before being
   able to transmit the buffer causing a serious degradation of the
   voice quality.

   This could be the problem in DSP based solutions in media gateways
   between IP and PSTN world but also for end user internet access
   devices (IAD) providing the possibility to attach a normal analog
   voice phone via a RJ11 jack (ATA - analog telephone adapter).

   For this use case, certain implementers are making arguments in the
   direction of a complete SDP negotiation mechanism.  But this is in
   conflict with the SDP paradigm where the 'ptime' is an optional
   parameter and not bound to a specific codec but to the media itself.
   Different proprietary solutions are now implemented causing even more
   interworking issues.








Willekens, et al.       Expires January 13, 2009               [Page 26]


Internet-Draft            Multiple ptime in SDP                July 2008


Authors' Addresses

   Marc Willekens
   Devoteam Telecom & Media
   Herentals, Antwerp  2200
   Belgium

   Email: marc.willekens@devoteam.com


   Miguel A. Garcia-Martin
   Ericsson
   Via de los Poblados 13
   Madrid,   28033
   Spain

   Email: Miguel.A.Garcia@ericsson.com


   Peili Xu
   Huawei Technologies
   Bantian
   Longgang, Shenzhen  518129
   China

   Email: xupeili@huawei.com

























Willekens, et al.       Expires January 13, 2009               [Page 27]


Internet-Draft            Multiple ptime in SDP                July 2008


Full Copyright Statement

   Copyright (C) The IETF Trust (2008).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.











Willekens, et al.       Expires January 13, 2009               [Page 28]