AVT                                                     Christian Hoene
Internet Draft                                  University of Tuebingen
Intended status: Informational                          August 17, 2009
Expires: February 2010



            Requirements of an Audio Communication System (ACS)
                  draft-hoene-avt-acs-requirements-00.txt


Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79. This document may not be modified,
   and derivative works of it may not be created, except to publish it
   as an RFC and to translate it into languages other than English.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

   This Internet-Draft will expire on February 17, 2010.

Copyright Notice

   Copyright (c) <insert year> IETF Trust and the persons identified as
   the document authors.  All rights reserved.

   Copyright (c) 2009 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents in effect on the date of
   publication of this document (http://trustee.ietf.org/license-info).




Hoene                 Expires February 17, 2010                [Page 1]


Internet-Draft           Requirements of ACS                August 2009


   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.

Abstract

   This document describes the requirements of an audio communication
   system (ACS) for acoustic content, especially speech and music. The
   ACS consists of all components above the IP layer and below a digital
   PCM audio interface. These include codec, jitter buffer, and
   transport.

   The goal of the ACS is to provide a bidirectional acoustic
   communication between any two Internet hosts at a good quality,
   constrained only by the available resources at the hosts and the
   characteristics of the transmission path between both hosts.

   The intention of the document is to provide the requirements for a
   codec that is solely intended for the Internet, to provide the
   requirements for the codec's payload specification, and to define the
   requirements on the transport protocol.

Table of Contents

   1. Introduction...................................................3
      1.1. Basics Architectural Guidelines of the Public Internet....4
      1.2. Problem Statement.........................................5
   2. Usage Scenarios................................................7
      2.1. Scenario 1: Person-to-person calls (VoIP).................8
      2.2. Scenario 2: High quality interactive audio transmissions
      (AoIP).........................................................8
      2.3. Scenario 3: Ensembles performing over a network (MMoIP)...9
      2.4. Scenario 3: Push-to-talk like service (PTT)...............9
   3. High-Level Requirements.......................................10
      3.1. Low cost and licensing free..............................10
      3.2. Reliable on the Internet.................................11
      3.3. Quality..................................................11
   4. Technical Requirements........................................12
      4.1. Audio content............................................12
      4.2. Quality..................................................12
      4.3. Reliability and congestion control.......................12
      4.4. Coding bit rate..........................................13
      4.5. Sampling rate............................................13
      4.6. Complexity...............................................13
      4.7. Latency..................................................14
      4.8. Packet rate..............................................14
      4.9. Packet loss resilience...................................15
      4.10. Frame erasure concealment...............................15


Hoene                 Expires February 17, 2010                [Page 2]


Internet-Draft           Requirements of ACS                August 2009


      4.11. Jitter compensation and playout buffer..................15
      4.12. Playout adjustments.....................................16
      4.13. Concealment of mode switches............................16
      4.14. Extrapolation...........................................16
      4.15. Interpolation...........................................17
      4.16. DTX.....................................................17
      4.17. Testing.................................................17
      4.18. Licensing and source code...............................17
      4.19. Versioning and software updates.........................18
      4.20. RFC Type................................................18
      4.21. Side channel............................................18
      4.22. Layered coding..........................................18
      4.23. Interoperability with PSTN..............................19
      4.24. Conferencing and speech recognition.....................19
      4.25. Self-testing support....................................19
      4.26. Self-awareness..........................................19
   5. Out of scope..................................................19
      5.1. Multichannel.............................................19
      5.2. Repacketization..........................................19
      5.3. Support for circuit-switched transmissions...............19
      5.4. Support of packet networks other than the Internet.......20
      5.5. Support of streaming.....................................20
      5.6. Random packet losses.....................................20
      5.7. Packet loss differentiation..............................20
      5.8. Robustness against bit errors............................20
      5.9. IRS and other kind of bandwidth filters..................20
      5.10. Support of voice band data, fax and DTMF................20
      5.11. Idle noise..............................................21
      5.12. Tandem coding...........................................21
      5.13. FEC.....................................................21
   6. Security Considerations.......................................21
   7. IANA Considerations...........................................21
   8. References....................................................21
      8.1. Normative References.....................................21
      8.2. Informative References...................................22
   9. Acknowledgments...............................................23

1. Introduction

   This document is based mainly on the discussions on the Codec BOF
   mailing list, which took place in 2009. It also based on the internal
   requirement documents of ITU-T G.718 [SG16 314-WP3], on the ITU-T
   G.719 standard, on the 3GPP document [TS26.114-830], and on existing
   IETF codec drafts.

   It is intended as basis of a requirement document that should lead to
   the design of an audio codec for the Internet. However, this document


Hoene                 Expires February 17, 2010                [Page 3]


Internet-Draft           Requirements of ACS                August 2009


   address the requirements of the entire system not only of a single
   component because we want to ensure that the system as a hole works
   well not only some parts of it.

   We introduce the term audio communication system to describe the
   parts of an IP based telephone which are care for the bidirectional
   transmission of acoustic content between two Internet hosts. These
   include the encoder, the payload encapsulation, guide lines on how to
   use transport protocols (RTP, UDP, TCP, DCCP), the playout buffer,
   the decoder, the concealment of packet loss, time adjustments,
   changes of encoding parameter, and various mechanisms to manage,
   control and monitor the acoustic transmission.

   The ACS is intended mainly for the use on the public Internet and
   should be as easily distributable as most other Internet protocols
   that run on virtually all kind of devices and on all kind of
   communication links. Also, the ACS shall be affordable by all humans
   that have Internet access. If possible, it should be royalty free and
   available as open-source software. If these requirements are given,
   then the ACS can fulfill its goal of providing acoustic transmission
   between _any_ two Internet hosts.

1.1. Basics Architectural Guidelines of the Public Internet

   The ACS is intended for the public Internet and follows similar
   architectural design guidelines as those which are valid for other
   Internet protocols, too. These include:

   o  End-to-end semantics saying that transport protocol units are
      transmitted from one end (an Internet host) to the other end
      without any intermediate changes.

   o  Network neutrality.

   o  Best effort service that try to transmit packets as good as
      possible but that cannot guaranty any minimal transmission
      bandwidth or maximal transmission delay. Instead one has to cope
      with any end-to-end transmission quality that is provided.

   o  Congestion control to prevent congestion collapse of the Internet
      (such as TCP or DCCP). Typically, TCP controls the number of
      packets that are sent during periods of congestion. Thus, one has
      to consider that the number of packets per second might be an
      important constraining limitation and not only the bits per
      second.




Hoene                 Expires February 17, 2010                [Page 4]


Internet-Draft           Requirements of ACS                August 2009


   o  Internet protocols are scalable to wide degree. They work on links
      having a very low bandwidth (in the order of bits per second) and
      with very high bandwidth (in the order of gigabits per second).
      The transmission latency can range from microsecond up to seconds.
      Also, the Internet hosts might have very low processing and memory
      capabilities (such as an 8-bit micro controller). However, even
      then they can communicate with any other hosts. Flow control (such
      as in TCP) is used to cope with hosts that have limited resources.

   o  Functions to help monitoring the communication (such as the
      features provided ICMP)

   o  The most important Internet protocols can be used without paying
      royalties.

   o  The public Internet allows global communication between any two
      hosts connected to the public Internet. Typically, the user only
      has to pay for getting access to the public Internet not for the
      distance that the IP packets have to travel.

   o  Internet standards should be as simple as possible (but no
      simpler).

1.2. Problem Statement

   The ACS should enable an acoustic communication between any two
   Internet hosts considering the features of the Internet as described
   above. We see the need for designing the ACS because we see the
   following weaknesses in the existing codec and VoIP designs.

   o  Many standardized speech and audio codecs require the payment of
      royalty fees. Only codecs such as G.711, G.722, G722.1, and
      G.722.1C that have mediocre performances can be used license free.
      Thus, one cannot ensure that a good codec can be afforded by all
      owners of all Internet hosts.

   o  All known codecs have a small operational range, in addition they
      do not adapt to a wide range of bandwidth. For example, AMR
      support bit rates between 4.75 and 12.2 kbps and ITU G.719 support
      rates between 32 kbps and 128 kbps.









Hoene                 Expires February 17, 2010                [Page 5]


Internet-Draft           Requirements of ACS                August 2009


   o  An acoustic communication at superb transmission quality is not
      supported. Especially, if the latency is very low and the
      bandwidth is very high, we do not have a standardized codec that
      support hifi quality at ultra low delays. Only the SBC audio codec
      standardized by Bluetooth SIG [A2DPV10] can be considered for the
      usage scenario.
      Ultra-low delay transmissions at hifi quality are especially
      useful for distributed ensemble performances or distributed
      choruses.

   o  Similar, if the transmission quality is very bad, no standardized
      audio codec supports a grateful degradation. If the loss rate
      becomes too high then all speech and audio codecs become useless.
      However, in those cases one can use half-duplex, push-to-talk like
      transmission of short audio segments that would still allow a very
      slow communication at very low bitrates.

   o  Frequently, a PSTN call needs to be transcoded. Transcoding
      reduces the speech quality and increases latency. Thus, most
      codecs are designed to work well in conditions of transcoding.
      However, in case end-to-end IP transmission, the need for
      transcoding vanishes. It might only be needed for teleconferencing
      applications or for connecting to the PSTN network.

   o  The quality of a PSTN call has hardly increased during the last
      decade. Often, it is even worse because of IP based
      interconnections and support of cellular networks. Even those
      support of wideband speech transmission system have been
      developed, the lack of the willingness of users to pay more has
      limited the introduction of wideband speech. Also on the Internet
      we do not expect users to pay more for high quality phone calls.
      However, we believe that they will be delighted if they can
      communicate at nearly perfect quality.

   o  Neither a single standardized codec nor its RTP payload RFC
      specifies how to cope with time varying bandwidth and latency nor
      this is considered as required feature. This hinders the wide
      spread use of an adaptive coding mode selection and thus reduces
      the quality of many Internet phone calls.

   o  Not a single standardized codec supports varying complexities to
      support devices with low resources.







Hoene                 Expires February 17, 2010                [Page 6]


Internet-Draft           Requirements of ACS                August 2009


   o  Standardized codecs do not support any functionality for self-
      observation and self-monitoring. Also, they do not provide
      information about how well they encoded and decoded the audio
      content under a given set of coding parameters and packet loss
      rates. However, this information is important for the transport in
      order to rightly adapt the codec's transmission parameters.

   o  Packet losses occur in the Internet, the transmission time of
      packets and the playout time varies and the coding mode is changed
      in response to changed available transmission bandwidths. All
      these things cause the audio stream to be temporally distorted.
      The codec shall support concealment algorithms to limit the
      perceptual distortion. However, none existing standardized codec
      support the concealment of the adjustment of the playout time.
      Also, standardized PLC work on extrapolation of previous audio
      segments and do not support the interpolation. Lastly, often one
      cannot distinguish between delaying the playout time and packet
      loss because the missing packet might still arrive. Thus, an
      algorithm that uses the same extrapolation for packet loss
      concealment and time stretching might be beneficial.

   o  None of the standardized interactive speech and audio codec
      supports mechanisms to decrease the packet rate. Usually, packet
      rates are reduced by putting multiple speech frames into on RTP
      packet. However, the codecs do not take advantage of the high
      algorithmic delay that can be utilized then. Thus, they work less
      efficient in situations of congestion.

2. Usage Scenarios

   The ACS should be optimized towards real-time communications over the
   Internet. It should support applications like collaborative network
   music performance, high-quality teleconferencing, wireless audio
   equipment, low-delay links for broadcast applications, network sound
   servers for using multimedia applications remotely, telepresence
   (enterprise) and the digital living room (consumer), and other.

   The ACS shall be general enough to support multiple and quite diverse
   network conditions. For example, if network latency is low and
   bandwidth is plenty, it can be used for quasi-simultaneous music
   transmissions allowing distributed ensemble performances. It is also
   applicable interactive hifi quality audio transmission. If the
   network connection worsens, the transmission quality degrades to
   (wide-band) interactive speech transmission. As a last resort, it
   emulates a high-delay, half-duplex push-to-talk like communication
   service.



Hoene                 Expires February 17, 2010                [Page 7]


Internet-Draft           Requirements of ACS                August 2009


   In the following, we enlist four main scenarios and describe their
   quality requirements.

2.1. Scenario 1: Person-to-person calls (VoIP)

   The classic scenario is that of the phone usage to which we will
   refer in this document as Voice over IP (VoIP). Human speech is
   transmitted interactively between two Internet hosts. Typically,
   beside speech some background noise is present, too.

   The quality of a telephone call is traditionally judged with
   subjective tests such as those describe in [ITU-T P.800]. The ACR
   scale used in MOS-LQS sometimes might not be very suitable for high
   quality, then - for example - the MUSHRA [ITU-T BS.1534-1] rating can
   be applied.

   A telephone call is considered good if it has a maximal mouth-to-ear
   delay of 150ms [ITU-T G.107] and a speech quality of MOS-LQS 4 or
   above. However, interhuman communication is still possible if the
   delay is much larger.

   This scenario does not include the use case of using a VoIP-PSTN
   gateway to connected to legacy telephone systems. In those cases, the
   gateway would make an audio conversion from broadband Internet voice
   to the frugal 1930's 3.1 kHz audio bandwidth. Interconnections to the
   PSTN will most likely stick with its legacy codecs to avoid
   transcoding.

2.2. Scenario 2: High quality interactive audio transmissions (AoIP)

   In this first scenario we consider a telephone call having a very
   good audio quality at modest acoustic one-way latencies ranging from
   50 and 150 ms [ITU-T G.107], so that music can be listened over the
   telephone while two persons talk interactively.

   The Absolute Category Rating (ACR) (refer to ITU-T P.800) can be
   used, too. However, it might be more efficient to measure quality
   with the MUSHRA tests given in [ITU-T BS.1534-1], which is intended
   for intermediate audio qualities.

   Also, for today's teleconferencing and videoconferencing systems
   there is a strong and increasing demand for audio coding providing
   the full human auditory bandwidth of 20 Hz to 20 kHz. This rising
   demand for high quality audio is due to the following:





Hoene                 Expires February 17, 2010                [Page 8]


Internet-Draft           Requirements of ACS                August 2009


   o  Conferencing systems are increasingly used for more elaborate
      presentations, often including music and sound effects which
      occupy a wider audio bandwidth than that of speech. For example,
      Web conferences such as WebEx, GoToMeeting, Adobe Acrobat Connect
      are based on an IP based transmission and benefit from a IP
      optimized ACS.

   o  The new "Telepresence" video conferencing systems, providing High
      Definition video and audio quality to the user, are giving the
      experience of being in the same room by introducing high quality
      media delivery (such as from Cisco).

   o  The emerging Digital Living Rooms will likely be interconnected
      and might require a constant acoustic transmission at high
      qualities.

2.3. Scenario 3: Ensembles performing over a network (MMoIP)

   In some usage scenarios, users want to act simultaneously and not
   just interactively. For example, if persons sing in a chorus, if
   musicians jam, or if e-sportsmen play computer games in a team
   together they need to acoustically communicate. We call it the Make
   Music Over IP (MMoIP) scenario.

   In this scenario, the latency requirements are much harder than for
   interactive usages. For example, if two musicians are placed more
   than 10 meters apart, they can hardly keep synchronized. Empirical
   studies [Gurevich2004] have shown that if ensembles playing over
   networks, the optimal acoustic latency is around 11.5 ms with
   targeted range from 10 to 25 ms.

   In addition to the MUSHRA tests, the recommendation [ITU-R BS.1116]
   can be used for audio transmissions that just have minor impairments.

2.4. Scenario 4: Push-to-talk like service (PTT)

   In spite of the development of broadband access (xDSL), a lot of
   users would only have service access via PSTN modems or mobile links.
   Also, on these links the available bandwidth might be shared among
   multiple flows and is subjected to congestion. Then, even low coding
   rates at about 8 kbps are too high.

   If transmission capacity hardly exists, one still can degrade the
   quality of a telephone call to something like a push-to-talk (PTT)
   like service having very high latencies. Technically, this scenario
   takes advantage of bandwidth gains due to disruptive transmission



Hoene                 Expires February 17, 2010                [Page 9]


Internet-Draft           Requirements of ACS                August 2009


   (DTX) modes and very large packets containing multiple speech frames
   causing a very low packetization overhead.

   The quality requirements of a push to talk like service have been
   hardly studied. The OMA lists as a requirement of a Push To Talk over
   Cellular service a transmission delay of 1.6 s and a MOS values of
   above 3.0 that typically should be kept [OMAPoCReq]. However, as long
   as an understandable transmission of speech is possible, the delay
   can be even higher. For example, [OMAPoCReq] allows a delay of
   typically up 4s for the first talk-burst.

   Also, [OMAPoCReq] describes a maximum duration of speaking. If a
   participant speaking reaches the time limit, the participant's right-
   to-speak shall be automatically revoked.

   If the quality of a telephone call is very low, then instead of
   listening-only speech quality the degree of understandability can be
   chosen as performance metric. For example, objective tests of the
   understandability use automatic speech recognition (ASR) systems and
   measure the amount of correctly detected words.

   In any case, the participant shall be informed about the quality of
   connection, the presence of high delays, the half-duplex style of
   communication, and its (limited) right-to-speak. For example this can
   be achieved by a simulated talker echo.

3. High-Level Requirements

   Based on the four scenarios, we list the following high-level
   requirements that the ACS should fulfill.

3.1. Low cost and licensing free

   The codec shall be affordable by all humans having Internet access.

   Thus, one of the key requirements is patent/licensing free
   technology. However, it cannot be seen as "legally binding
   requirement" but rather as a desired working goal. Typically, one
   cannot verify 100% whether a codec is totally free of unknown IPRs.
   Some patents may be overlooked. It can also be assured that the known
   IPRs are "license-free" and "free from the need to sign licensing
   agreement(s) before use" (The ability for any user to get the codec
   and use it without signing any paperwork).

   If one is practicing potentially patented technologies, there is no
   real mechanism to protect oneself from a patent troll at claims
   license fee for a standardized ACS. We have to assume that there is a


Hoene                 Expires February 17, 2010               [Page 10]


Internet-Draft           Requirements of ACS                August 2009


   certain probability that the designed ACS is covered by patents what
   the IETF is not aware of. Thus, one has to define proper procedures
   on how to cope with IPR claims even if the ACS is already
   standardized.

   Because of the lack of financial income, the codecs design, testing
   and standardization process must be cost effective, too. A cheap
   approach is needed to characterize the ACS, which might include tests
   having volunteer participants. For example, codecs can be provided to
   thousands of users in public to test them. Also, potential
   performance comparisons must not be as precise and proven as beyond
   any doubts because nobody wins or loses IPR fees if one solution wins
   or fails.

3.2. Reliable on the Internet

   The ACS must be optimized towards acoustic real-time communications
   over the Internet, and must have the flexibility to adjust to the
   environment it operates in. Based on the quality of the end-to-end
   speech packet transmission, the codec should adapt its quality and
   delay to achieve an optimal benefit for the user.

   As most Internet transport, it should be used with a wide range of
   condition allowing a high reliability regardless the networking
   condition. The reliability of the audio transmission should be high,
   even in cases of low and varying bandwidth. This implies that the
   codec is used on top of a transport protocol that implements a
   congestion control algorithm and that the ACS adapts to changes of
   available bandwidth. For example, if the available transmission
   bandwidth is too low to allow the codec to transmit audio at a high
   quality, the application can lower the sampling, bit or frame rate of
   the stream at the cost of higher algorithmic delay or a degraded
   audio quality.

3.3. Quality

   The ACS must provide a quality/bitrate trade-off that is competitive
   with other state-of-the-art codecs. Also, the codec must have a very
   low algorithmic delay so that it can support the typical requirements
   of its users.

   The speech and audio quality of the ACS should not be significantly
   worse than existing standardized codecs, if measures on the ACR
   scale.





Hoene                 Expires February 17, 2010               [Page 11]


Internet-Draft           Requirements of ACS                August 2009


4. Technical Requirements

4.1. Audio content

   At all bitrates the ACS must deliver speech in any language at good
   quality. The ACS must be tested for different speakers and at least
   with two languages and should support tonal languages as well.

   Frequently, speech needs to be transmitted not only without
   background noise but also at conditions including car, office and
   street noise. Background signals shall be considered not as the noise
   but as a part of the signals that convey information. Background
   signal can include background music at a SNR of 25 dB, office noise
   at a SNR of 20 dB, car noise at a SNR of 15 dB, babble Noise at a SNR
   of 25 dB, interfering talker at a SNR of 15 dB and street noise at a
   SNR of 20 dB.

   At high bitrates the quality must be excellent for any audio signal,
   especially music. Stereo is considered as a must. Also, for high
   quality audio conferencing, reverberant input signals should be
   considered for testing the modes.

   The speech and audio signals might have varying loudness. The
   transmission shall support a wide range of dynamics. The nominal
   input level of -36 dB, -26 dB and -16dB with respect to the
   overlapping bandwidth limit (OVL) point (-20 dBm0).

4.2. Quality

   At a given operational mode, the ACS must not have perfect quality
   and must not perform better than any other standardized codec.
   However, considering the most common network conditions, the ACS
   shall perform better than any combination of existing codecs most of
   the time.

4.3. Reliability and congestion control

   The acoustic transmission should be reliable and robust. The ACS
   shall be not only robust against packet losses but also for periods
   of low bandwidth.

   The mean availability of the audio transmissions, calculated over all
   users, might be one of the metrics for assessing the performance of
   an Internet audio codec.

   The ACS should adapt to the current network situation. Also, the
   codecs of ACS themselves must be adaptable, because switching among


Hoene                 Expires February 17, 2010               [Page 12]


Internet-Draft           Requirements of ACS                August 2009


   multiple codecs is difficult to negotiate and unlikely to work well
   in situations of inter-operation.

   Responding to congestion is a more complex issue and out of the scope
   of this document. However, it shall be defined on how to use existing
   congestion control protocols like DCCP and TCP. The ACS shall provide
   the mechanisms that congestion control requires from the codec (i.e.
   bitrate/framerate adaptability).

   Because of the interactive nature of the acoustic transmission, the
   bidirectional transmission of audio content can be used for
   transmitting the required feedback and implementing a control loop.
   As such, it can be considered as a requirement that the acoustic
   transmission should be always bidirectional--even if the backward
   channel just sends "compressed silence".

4.4. Coding bit rate

   The ACS must be capable of running at bitrates below 10 kbps. At low
   bitrates it must deliver good quality for clean, noisy or hands-free
   speech in any language. At high bitrates the quality must be
   excellent for any audio signal, including music. The bitrate must be
   adjustable in real-time. The bit rate can go up to 128 kbit/s per
   channel or more. The bitrate must be adjustable in real-time and at a
   fine granularity.

   Variable bit rates depending on the content should be supported.

4.5. Sampling rate

   The codec must support multiple sampling rates, ranging from 8 kHz to
   full band. Switching between sampling rates must be carried out in
   real-time.

4.6. Complexity

   The ACS should have a complexity that is adjustable in real-time,
   where a higher complexity setting improves the quality/bitrate trade-
   off.

   As a lower limit, the ACS shall run on hosts that common in
   developing countries. These may include OLPC XO-1s or other low-end
   (refurbished) computers (refer to Computer Aid International) and
   smart phones like those based on Texas Instruments Open Multimedia
   Application Platform (OMAP), which include both a host ARM CPU and
   one or more DSP.



Hoene                 Expires February 17, 2010               [Page 13]


Internet-Draft           Requirements of ACS                August 2009


   On those devices, the ACS must not be capable of running at highest
   quality but at least at 8 kHz sampling rate.

4.7. Latency

   To maintain a good quality of services requiring interactivity, it is
   necessary to maintain the overall delay as low as possible. But the
   delay requirement tends to have less importance in applications
   involving VoIP, possibly combined with other media and/or in
   heterogeneous network environment. A trade-off must be found between
   low delays and flexibility (scalability, ability to operate in
   various conditions with many types of signals etc.).

   In interactive scenarios, the codec should be capable of running with
   an algorithmic delay of no more than 30 milliseconds.

   For the making music scenario, the algorithmic delay must be between
   3 to 9 ms. Still, given the speed of light as the fundamental limit
   of speed of information exchange, distributed ensembles can perform
   only regionally if latency budget of 25 ms must be kept. Typically,
   an optical fiber has a refractive index of 1.46 and thus in an
   optical fiber bits travel about 5136 km one-way in 25 ms.

   The total codec delay consists of the algorithmic delay and the
   processing delay. Algorithmic delay includes the frame size delay
   plus any other delays inherent in the algorithm (look-ahead, noise
   suppression and error correcting codes for algorithm purposes and any
   algorithmic decoding delay). Processing delay is the additional delay
   caused by implementation with a finite speed processor.

4.8. Packet rate

   The ACS must support a variable and dynamic changeable packet rate.
   Putting several frames into one packet is useful for packet grouping,
   which in turn is very useful for bandwidth adaptation and network
   usage efficiency.

   This is because of the fact that a lot of bandwidth is used for
   protocol packet headers like those of Ethernet, IP, UDP, and RTP and
   thus to overhead at the MAC layer. If even IP header compression is
   applied, still many layer 2 protocols introduce an additional
   overhead that is not compressed [Hoene2005].

   Classically, it is usually specified in the RTP payload
   specification, not in the codec specification itself. In general, a
   codec can take advantage of a larger frame size. This is especially
   true for a transform codec, where a larger frame means better


Hoene                 Expires February 17, 2010               [Page 14]


Internet-Draft           Requirements of ACS                August 2009


   frequency resolution. The gain is somewhat smaller time-domain codec
   especially for > 20 ms frames. However, in larger packets the inter-
   frame dependencies can be adjusted on the fly to choose a trade-off
   between bitrate and amount of error propagation. It may even be
   possible to just make use of more inter-frame correlation for frames
   2...N in a packet of N frames and get most of the benefits it would
   get from a larger frame size. Thus, the ACS codec should support
   large frame sizes (up to a MTU).

4.9. Packet loss resilience

   The codec must be capable of running with little error propagation,
   meaning that the decoded signal after one or more packet losses is
   close to the decoded signal without packet losses after no more than
   two additional packets.  The codec must have a packet loss resilience
   that is adjustable in real-time, where a lower packet loss resilience
   setting improves the quality/bitrate trade-off.

   Also, the codec may add inter-frame redundancies to achieve better
   loss robustness.

4.10. Frame erasure concealment

   The ACS must have a packet loss concealment algorithm. The PLC must
   be standardized to know how well the decoder can cope with packet
   losses in cases when the transmission parameters must be adjusted.
   However, the ACS may implement a PLC that performs better than the
   standardized PLC.

   The purpose of standardizing the PLC (and the other concealment
   algorithms) is to guarantee a certain quality level over a range of
   conditions. For good results, a PLC operates on decoder-internal
   parameters and states, which requires tight algorithmic integration.
   So the PLC is as much part of a decoder as any other decoder module.
   The above also applies to time compression/stretching methods for
   handling network jitter and other kind of concealment algorithms (as
   mentioned below).

4.11. Jitter compensation and playout buffer

   The ACS must cope with jitter. It must be able to receive the out of
   order de-packetized frames and present them in order for decoder
   consumption. It must be able to receive duplicate speech frames and
   only present unique speech frames for decoder. It must be able to
   handle clock drift between the encoding and decoding end-points.




Hoene                 Expires February 17, 2010               [Page 15]


Internet-Draft           Requirements of ACS                August 2009


   The playout buffer should minimize the buffering time at all times
   while still conforming to the minimum performance requirements. If
   the limit of jitter induced concealment operations cannot be met, it
   is always preferred to increase the buffering time in order to avoid
   growing jitter induced concealment operations.

4.12. Playout adjustments

   The ACS should support time scale modifications especially for jitter
   compensations such as time stretching and time shrinking because on
   the Internet jitter is the norm not a special case.

   Because the operations going on in time scale modification algorithms
   are similar as those for the PLC, these operations should be combined
   into a single algorithm.

   Also, the ACS shall be able to determine a desired length of a time
   scale modification (so it can e.g. leave out or add one or more pitch
   periods), to keep a 'backup' decoder state of the previous frame or
   to add one more frame length of decoding latency - otherwise you can
   not compress the voice of the previous packet and for stretching its
   suboptimal.

   In general, the use of a high-quality time scaling algorithm is
   recommended. The amount of scaling should be as low as possible,
   scaling should be applied as infrequently as possible, and
   oscillating behavior is not allowed.

4.13. Concealment of mode switches

   The ACS should also support the concealment of distortions caused by
   switching coding modes [Hoene2005]. Also, the negative effect of
   switching the coding mode shall be low.

   For example, the transmission and coding mode might change several
   times (up to 5Hz) per second after getting feedback from the decoder.

4.14. Extrapolation

   Sometimes, it is not possible to distinguish between a packet that
   arrives too late and packet that is lost and needs to be concealed.
   The decision on whether to conceal the loss or whether to conduct
   time stretching cannot be made yet. Thus, the ACS should support a
   general extrapolation of the audio signal which allows for late
   decision on whether to play out a delayed packet or whether to use a
   loss concealment operation



Hoene                 Expires February 17, 2010               [Page 16]


Internet-Draft           Requirements of ACS                August 2009


4.15. Interpolation

   If a packet n has not arrived but the previous packet n-1 and the
   following packet n+1, when the packet n shall be interpolated using
   the frame of the previous and following packets.

4.16. DTX

   The codec must be capable of using Discontinuous Transmission (DTX)
   where packets are sent at a reduced rate when the input signal
   contains only background noise.

4.17. Testing

   The testing of ACS and the quality characterization shall be
   performed with real network profiles such as with [TIA-921] or those
   given in the appendix [TS.26114-830], not with fixed set of "average
   distributed errors and losses". Later do not clearly reflect the
   Internet nature.

   Also, test vectors might be provided to check the correctness of the
   implementations.

4.18. Licensing and source code

   The usage of ACS should not require paying royalties and signing NDA.
   At the time of standardization it should be available for royalty
   free (RF) and at reasonable and non-discriminatory terms (RAND). The
   codec should be available as open source allowing implementation
   under BSD, LGPL and/or GPL.

   The codec specification description and implementation shall be based
   on a bit-exact fixed-point modular ANSI-C code using basic operators
   set provided in the ITU-T Software Tool Library to follow. In
   addition, an interoperable floating-point implementation can be
   provided.

   The source code shall be normative because of a number of
   reasons. One is ease of implementation (either using the reference
   code directly, or being able to use it to validate the ported code).
   Another is that it assures that the characterization tests actually
   measure the standard's performance. Even if it is not officially
   normative, readily available reference code becomes de facto
   normative, since most implementers will simply use the code and
   ignore the text in the RFC.




Hoene                 Expires February 17, 2010               [Page 17]


Internet-Draft           Requirements of ACS                August 2009





4.19. Versioning and software updates

   In order to cope with changes in the bitstream format, which might be
   required due to errors in the specification or - more important - due
   to newly claimed IPR, it must be possible to update the ACS online.

   Also, it must be indicated, which bitstream format is going to be
   used.

4.20. RFC Type

   It should become a standard, not an experimental RFC.

4.21. Side channel

   Congestion control should be must for all Internet applications also
   for the ACS. [RFC3550] suggests in Chapter 10 somewhere that the RTP
   profile should care for rate adaptation. Thus, the ACS should take
   advantage of a feedback loop for variable coding parameter control in
   order to allow a wide range of operation and to adapt to the the
   current available bandwidth and processing power.

   Congestion control per se is outside the review of this group, but
   providing the hooks for a congestion-control mechanism to interact
   with the codec is quite important. For example, running this codec on
   a TFRC-enabled or DCCP RTP stream - TFRC and DCCP need to be able to
   adjust (via the application) the bitrate of the codec in order to
   implement congestion control and perhaps adjust packetization
   periods/packet-rates.

   A side channel for adaptation can be added. This would make sense
   because in usage scenarios audio is always transmitted in both
   directions. Adding a control channel would give a real advantage to
   existing codec designs. Alternatively, such as side channel can be
   also added with alternative solutions, such as handling that
   communication in SIP/SDP and in RTP/RTCP.

4.22. Layered coding

   The ACS can support a layered encoding like in G.729.1 and G.718.
   Layered coding can be seen as a method for computationally efficient
   transcoding. Layered coding make sense in the conferencing
   environment as such stripping should be done at the sender after



Hoene                 Expires February 17, 2010               [Page 18]


Internet-Draft           Requirements of ACS                August 2009


   encoding. Then, for all receivers the encoding has to be done only
   once.

   However, for bidirectional transmissions, you do not need layered
   encoding as most codecs now are VBR, its enough already to adapt
   codec (at the source) to the bandwidth. Also, layered coding comes at
   additional cost (about 10% of the coding rate)

4.23. Interoperability with PSTN

   The ACS might be developed to be interoperability to existing PSTN
   systems. Especially interoperability with 2G and 3G mobile radio
   systems is desirable. Also, the interoperability with G.722.2 @ 12,65
   kb/s and with G.722 (for DECT devices) are of particular interest.

4.24. Conferencing and speech recognition

   A teleconference server should be able to mix the audio signals at
   lower complexity than decoding + encoding. The ACS shall be capable
   of support automatic speech recognition.

4.25. Self-testing support

   ACS should support means of testing the quality of a connection by
   feedback loops and quality feedbacks.

4.26. Self-awareness

   The ACS should be aware on how well it can transmit acoustic content
   at various coding parameters and packet loss rates.

5. Out of scope

5.1. Multichannel

   5.1 is worth supporting but that would most likely be through
   multiple independent channels/pairs, so that's probably not that much
   of an issue.

5.2. Repacketization

   The ACS needs not to support repacketization in a network because
   this would violate the end-to-end semantic of the Internet.

5.3. Support for circuit-switched transmissions

   The ACS needs not to support circuit-switched transmission.


Hoene                 Expires February 17, 2010               [Page 19]


Internet-Draft           Requirements of ACS                August 2009


5.4. Support of packet networks other than the Internet

   The ACS needs not to support other packet networks (VoATM, private
   networks) beside the Internet.

5.5. Support of streaming

   The ACS needs not to support multimedia streaming (e.g. video + audio
   involving bit-rate tradeoff), multicast content distribution
   (offline/online) and message retrieval systems.

5.6. Random packet losses

   The usage of random packet losses to measure the concealment
   performance is meaningless because it does not reflect the nature of
   the Internet. Thus, the codec needs not be optimized nor tested using
   these criteria. Instead, real packet loss and delay traces should be
   considered. Also, short and long bursts of packet losses, which occur
   during due to handoffs, fast fading, congestion events, and route
   changes, should be considered.

5.7. Packet loss differentiation

   The ACS cannot assume that the quality of packet transmission changes
   one per packet basis. For example, in layered coding the core layers
   cannot expect to be less subjected to packet losses than enhancement
   layers.

5.8. Robustness against bit errors

   The ACS needs not to support bit errors because they are quite seldom
   on top of Ethernet. This is especially true as long as UDP-Lite is
   not supported widely.

5.9. IRS and other kind of bandwidth filters

   The ACS must not consider bandwidth filters like the IRS because they
   are based on the traditions of circuit-switched connections.

5.10. Support of voice band data, fax and DTMF

   The ACS needs not to support voice band data such as fax or DTMF.
   Instead, alternative ways of communication or other RTP payload
   format should be considered.





Hoene                 Expires February 17, 2010               [Page 20]


Internet-Draft           Requirements of ACS                August 2009


5.11. Idle noise

   The generation of idle channel noise should not be used to indicate
   that the call is still active. Instead, in case of transmission
   problems an acoustic notification can be given.

5.12. Tandem coding

   The ACS needs not to be optimized for tandem coding conditions
   because one can assume an end-to-end transmission of IP packets.

   Tandem coding might only be used for PSTN gateways and for conference
   bridges.

5.13. FEC

   RTP support of Forward Error Correction (FEC) needs not to be
   considered. Also, support of adding "redundant speech frames", which
   have been transmitted in preceding packets, in a RTP packet is not
   required. Instead, the redundancy can be added by the encoder which
   does this in a more efficient way.

6. Security Considerations

   To do.

7. IANA Considerations

   To do.

8. References

8.1. Normative References

   [ITU-T BS.1534-1] "BS.1534 : Method for the subjective assessment of
             intermediate quality levels of coding systems", ITU-T
             Recommendation BS.1534-1 (01/03).

   [ITU-T G.107] "G.107 : The E-model, a computational model for use in
             transmission planning", ITU-T Recommendation G.107 (04/09).

   [ITU-T P.800] "P.800 : Methods for subjective determination of
             transmission quality", ITU-T Recommendation P.800 (08/96).

   [ITU-R BS.1116] "BS.1116 : Methods for the subjective assessment of
             small impairments in audio systems including multichannel
             sound systems", ITU-R Recommendation BS.1116 (10/97).


Hoene                 Expires February 17, 2010               [Page 21]


Internet-Draft           Requirements of ACS                August 2009


   [OMAPoCReq] "Push to talk over Cellular Requirements", Open Mobile
             Alliance, Approved Version 1.0, 09 Jun 2006, OMA-RD-PoC-
             V1_0-20060609-A.pdf

   [TIA-921] TIA-921-A Document Information: "Network Model for
             Evaluating Multimedia Transmission Performance Over
             Internet Protocol", Publisher: Telecommunications Industry
             Association, Publication Date: Jun 18, 2008

   [TS26.114-830] 3GPP TS 26.114 V8.3.0, "IP Multimedia Subsystem (IMS);
             Multimedia telephony; Media handling and interaction",
             Rapporteur: Per Froejdh, Version 8.3.0, 2009-06-12,
             RTS/TSGS-0426114v830.

8.2. Informative References

   [A2DPV10] Bluetooth SIG, "Advanced Audio Distribution Profile", Audio
             Video WG, adopted specification, revision V1.0, May 22th,
             2003.

   [celt-draft] J-M. Valin, T. Terriberry, G. Maxwell, C. Montgomery,
             "Constrained-Energy Lapped Transform (CELT) Codec",
             Internet draft, draft-valin-celt-codec-01, work in
             progress, July 13, 2009.

   [Gurevich2004] Gurevich, M., Chafe, C., Leslie, G., and Tyan, S.,
             "Simulation of Networked Ensemble Performance with Varying
             Time Delays: Characterization of Ensemble Accuracy",
             Proceedings of the 2004 International Computer Music
             Conference, Miami, USA, 2004.

   [Hoene2005] Hoene, C., and Karl, H., and Wolisz, A., "A perceptual
             quality model intended for adaptive VoIP applications",
             International Journal of Communication Systems, Wiley,
             August 2005.

   [SG16 314-WP3] ITU-T SG16, "Agenda and list of documents for Q9/16",
             Temporary Document 314-WP3, Received on 2008-04-22 From
             Rapporteur Q9/16.

   [silk-draft] K. Vos, S. Jensen, K. Soerensen, "SILK Speech Codec",
             Internet draft, draft-vos-silk-00.txt, work in progress,
             July 6, 2009.

   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
             Jacobson, "RTP: A Transport Protocol for Real-Time
             Applications", STD 64, RFC 3550, July 2003.


Hoene                 Expires February 17, 2010               [Page 22]


Internet-Draft           Requirements of ACS                August 2009


9. Acknowledgments

   The authors like to thank the various contributors taking part at the
   discussion on the Codec BOF mailing list in the period till September
   2009. Also, this document is based on the SILK [silk-draft] and CELT
   drafts, the internal requirement documents of ITU-T G.718 [SG16 314-
   WP3] and the 3GPP document [TS26.114-830].

   The author likes to thank Henry Sinnreich for his valuable feedback
   and support.

   Funding for this draft has been provided by the University of
   Tuebingen within the "Projektfoerderung fuer Nachwuchswissen-
   schaftler".

   This document was prepared using 2-Word-v2.0.template.dot.

































Hoene                 Expires February 17, 2010               [Page 23]


Internet-Draft           Requirements of ACS                August 2009


Author's Address

   Christian Hoene
   University of Tuebingen
   WSI-RI
   Sand 13
   72076 Tuebingen
   Germany

   Phone: +49 7071 2970532
   Email: hoene@ieee.org






































Hoene                 Expires February 17, 2010               [Page 24]