Audio Video Transport Group
      Internet Draft                                              A. Basso
      Document: draft-basso-avt-videoconreq-00.txt      NMS Communications
                                                                   O.Levin
                                                                 RADVISION
                                                                 N. Ismail
                                                             Cisco Systems
      Expires: January 2004                                      July 2003
   
   
   
             Requirements for transport of video control commands
   
   
   
   Status of this Memo
   
      This document is an Internet-Draft and is in full conformance with
      all provisions of Section 10 of RFC2026 [1].
   
      Internet-Drafts are working documents of the Internet Engineering
      Task Force (IETF), its areas, and its working groups.  Note that
      other groups may also distribute working documents as Internet-
      Drafts.
   
      Internet-Drafts are draft documents valid for a maximum of six
      months and may be updated, replaced, or obsoleted by other
      documents at any time.  It is inappropriate to use Internet-Drafts
      as reference material or to cite them other than as "work in
      progress."
   
      The list of current Internet-Drafts can be accessed at
           http://www.ietf.org/ietf/1id-abstracts.txt
      The list of Internet-Draft Shadow Directories can be accessed at
           http://www.ietf.org/shadow.html.
   
      Copyright Notice
   
          Copyright (C) The Internet Society (1999-2003).  All Rights
      Reserved.
   
   
   Abstract
   
      A variety of video communication services such as video
      conferencing and video messaging rely on the capability of video
      encoders and decoders to exchange control commands. This document
      outlines this set of commands as well as the requirements for
      their transport.
   
   
   
   basso                   Expires  January 2004                [Page 1]


                         Codec Control Requirements             July 2003
   
   
   
   
   Conventions used in this document
   
      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
      NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in
      RFC-2119 [2].
   
   
   Table of Contents
   
      1. Introduction...................................................2
      2. Background.....................................................3
      3. Video coding...................................................3
      4. Use Cases......................................................3
      5. Codec Commands.................................................5
         5.1 Decoder Control Commands...................................5
         5.2 Encoder Control Commands...................................5
      6. General requirements...........................................6
         6.1 Reuse of Existing Protocols................................6
         6.2 Maintain Existing Protocol Integrity.......................6
         6.3 Avoid Duplicating Existing Protocols.......................6
         6.4 Efficiency.................................................7
      7. Codec Control Requirements.....................................7
         7.1 Reliable versus unreliable delivery........................7
         7.2 Capability description.....................................7
         7.3 Relation with media session................................7
         7.4 Relation with signaling....................................8
         7.5 Bidirectional transport....................................8
         7.6 Extensibility..............................................8
         7.7 Unicast and Multicast Support..............................8
         7.8 Interoperability with other protocols......................8
         7.9 Timely delivery............................................8
      8. Security Considerations........................................9
      9. Acknowledgments................................................9
      References........................................................9
      Author's Addresses...............................................10
   
   
   1. Introduction
   
      A variety of video communication services such as video
      conferencing and video messaging rely on the capability of video
      encoders and decoders to exchange control commands. This document
      outlines this set of commands as well as the requirements for
      their transport.
   
   
   
   
   basso                   Expires  January 2004                [Page 2]


                         Codec Control Requirements             July 2003
   
   
   2. Background
   
      RTP [6] is the protocol of choice for the delivery of real time
      media. RTCP, the companion control protocol, allows some form of
      monitoring of the media delivery. An enhanced RTCP feedback scheme
      enabling a generic decoder to provide hints to the corresponding
      encoder in case of network losses has been described in [6].
      Similar solutions were provided for specific coding schemes such
      ad H.261 [3] H.263 [4] and MPEG-4 [5].
      Currently, there is no standard protocol support that allows a
      given application to exchange control commands with a given codec.
   
   
   3. Video coding
   
      Current coding schemes such as H.261 [2], H.263 [3], MPEG-1, 2,4
      [5], H.264 [6] can encode video pictures as reference frames, also
      known as intra frames or predicted frames, also known as inter
      frames. The reference frames can be decoded independently of the
      other frames. The predicted frames instead carry only the
      difference information with respect to one or more (as in H.263
      Annex N and H.264) reference frames and thus can only be decoded
      if the information relative to the reference frames is known.
   
      Furthermore video pictures are not coded as a whole but are
      partitioned in small blocks called macrobolocks (MB) and every MB
      is individually coded. MBs are organized in stripes of variable
      size. Such stripes are called, in dependence of the coding
      standard, slices or Group of Blocks (GoBs).
   
      The encoder decision to code a given picture as reference frame or
      predictive frame depends on its internal logic and its own coding
      optimization scheme that is implementation dependent.
   
   
   4. Use Cases
   
      This section describes use cases of codec control commands.
   
      1. A use case includes an RTP video mixer composing multiple
      encoded video sources into a single encoded video stream. Each
      time a video source is to be added to the video composition, the
      RTP mixer needs to request an encoded reference frame from the
      video source or a specific area of the picture defined by one or
      more slices.
   
      2. Another use case includes an RTP video mixer that receives
      multiple encoded RTP video streams from conference participants
      and dynamically selects one of the streams to be included in its
   
   
   basso                   Expires  January 2004                [Page 3]


                         Codec Control Requirements             July 2003
   
   
      output RTP stream.  For every new video stream selected, the mixer
      will request a reference frame from the remote source in order for
      the receiving endpoints to be able to decode and display the
      output stream smoothly when the switch occurs. The video mixer in
      this scenario will stop the delivery of the current RTP stream and
      it will wait for the reference frame from the source before it
      switches to that source.
   
      3. Another use case includes a given application that needs to
      signal to the remote encoder a request of change in the coding
      strategy asking to deliver video pictures at a lower frame rate
      but with better picture quality or vice versa. Such requests may
      be based on input from the end user.
   
      4. Another use case includes an application that has became aware
      of packet losses and in order to mitigate their effect requests a
      reference frame from the remote encoder. A reference frame will
      stop the spatial and temporal propagation of coding errors
      inherent to commonly used predictive video coding schemes.
   
      5. Another use case includes a video mixer that switches its
      output stream to a new video source. The video mixer will instruct
      the receiving endpoints by means of a codec control command to
      complete the decoding of the current frame and then wait for a new
      video reference frame. Concurrently, the video mixer requests a
      reference frame from the new video source and immediately switches
      to the new source. Once the new source receives the request for
      the reference frame and acts on it, the receiving endpoints will
      restart decoding and displaying the new picture.
      The main benefit of this method as opposed to the video mixer
      stopping video transmission of the new source until it detects a
      new reference frame, as in use case 2, is that the video mixer
      does not have to discover the beginning of a reference frame. This
      can simplify the video mixer task especially in the case in which
      the picture has multiple reference frames.
   
      6. Another use case includes a video mixer that dynamically
      selects one of the received video streams to be sent out to
      participants and tries to provide the highest bit rate possible to
      all participants while minimizing stream transrating. One way of
      achieving this is for the mixer to setup sessions with endpoints
      using the maximum bit rate accepted by that endpoint and by the
      call admission method used by the mixer.
      By means of commands that allow flow control, the mixer can then
      reduce the maximum bit rate sent by endpoints to the lowest common
      denominator of all received streams. As the lowest common
      denominator changes due to endpoints joining or leaving, the mixer
      can adjust the limits to which endpoints can send their streams to
      match the new limit.
   
   
   basso                   Expires  January 2004                [Page 4]


                         Codec Control Requirements             July 2003
   
   
      The mixer then would request a new maximum bit rate, which is
      equal or less than the maximum bit-rate negotiated at session
      setup, for a specific media stream, and the remote endpoint can
      respond with the actual bit-rate that it can support.
   
   
   5. Codec Commands
   
   5.1 Decoder Control Commands
   
      1. VideoFreezePicture
   
      It instructs the video decoder to complete the decoding of the
      current video frame and subsequently display it until receipt of
      the command to release the frozen picture and resume normal
      decoding and presentation. Note that the freeze picture release
      command is part of the H.261, H.263 and H.264 bitstreams. See use
      case 5 for an example of how such command might be used.
   
   5.2 Encoder Control Commands
   
      1. videoFastUpdatePicture
   
      It instructs the video encoder to complete the encoding of the
      current video frame and to generate a full reference frame at the
      earliest opportunity. The evaluation of such opportunity includes
      the current encoder coding strategy and the current available
      network resources. Coding schemes that support picture freeze
      release in their bitstreams, MUST use freeze release to signal the
      remote end to resume decoding.
      Reference pictures, independently from the instant in time when
      they are encoded, are in general several times larger in size than
      predicted pictures.  Thus in scenarios in which the available
      bandwidth is small the use of a reference picture implies a delay
      that is significantly longer than the typical picture duration.
   
      2. videoFastUpdateGOB(firstGOB, numberOfGOBs)
   
      It instructs the video encoder to perform a fast update of one or
      more GOBs. firstGOB indicates the number of the  GOB to be
      updated, and numberOfGOBs indicates the number of GOBs to be
      updated.
      The term GOB is used here with the same definition given in [4],
      i.e., a Group of Blocks (GoB) is   a consecutive number of
      macroblocks in scan order.
      More recent video coding standards have introduced the notion of
      ôrectangular slice".  A rectangular slice may lead to sending more
      than one GOB.
   
   
   
   basso                   Expires  January 2004                [Page 5]


                         Codec Control Requirements             July 2003
   
   
      The efficiency of algorithms using the videoFastUpdateGOB is
      reduced greatly when the command is not transmitted in a timely
      fashion because the motion compensation algorithm at the far-end
      receiver will not necessarily recognize the corrupt data as
      invalid.
   
      4. VideoTemporalSpatialTradeOff(index)
   
      It instructs the video encoder to change its trade-off between
      temporal and spatial resolution. Index assumes values from O to 31
      to indicate monotonically a desire for higher frame rate.
      In general the encoder reaction time may be significantly longer
      than the typical picture duration.
   
      5.  RateRequest(MaxBitrate)
   
      It instructs the far-end encoder to change the maximum bit rate of
      the given media stream being transmitted. MaxBitRate indicates, in
      units of 100 bit/s, the new requested maximum bit rate for the
      associated media stream. The new requested bit rate has to be
      equal to or less than the bit rate negotiated during session
      setup.
   
      6. RateNotify(MaximumBitRate)
   
      This message is sent as a response of a RateRequest message.
      MaximumBitRate indicates, in units of 100 bit/s, the maximum bit
      rate for the media stream at which the terminal is going to encode
      the media stream. Note that MaximumBitRate may differ from the
      requested MaxBitrate.
   
   
   6. General requirements
   
   6.1  Reuse of Existing Protocols
   
      The codec control messages should be transported using an already
      existing transport protocol whenever possible. The transport
      protocol should allow at a minimum the leveraging of its security
      elements.
   
   6.2  Maintain Existing Protocol Integrity
   
      In meeting the requirement of Section 7, the codec control
      transport mechanism MUST NOT break existing protocols or cause
      backward compatibility problems.
   
   6.3 Avoid Duplicating Existing Protocols
   
   
   
   basso                   Expires  January 2004                [Page 6]


                         Codec Control Requirements             July 2003
   
   
      The codec control mechanism SHOULD NOT duplicate the functionality
      of existing protocols.  The focus of codec control is new
      functionality not addressed by existing protocols or extending
      existing protocols within the structures of the requirement in
      Section 7.  Where an existing protocol can be gracefully extended
      to support codec control requirements, such extensions are
      acceptable alternatives for meeting the requirements.
   
   6.4 Efficiency
   
      The codec control transport mechanism SHOULD employ protocol
      elements known to result in efficient operation.  Techniques to be
      considered include re-use of transport connections across sessions
      i.e. codec control messages that controls different media sessions
      may be aggregated on one codec control transport channel and
      piggybacking of responses on requests in the reverse direction
   
   
   7. Codec Control Requirements
   
   7.1 Reliable versus unreliable delivery
   
      The commands VideoPictureFreeze and VideoTemporalSpatialTradeOff
      and  the commands relative to flow control RateRequest, RateNotify
      require a reliable delivery.
   
      The commands videoFastUpdatePicture, videoFastUpdateGOB imply a
      specific modification of the media, which is delivered in an
      unreliable fashion. Given that the delivery of the media is
      unreliable the sender cannot rely on the fact that the request has
      been safely delivered but needs to assure that the requested
      modification of the data (i.e., insertion of a reference frame) is
      received.
   
   7.2 Capability description
   
      Codec control capability for each supported message should be
      described and negotiated, for example using SDP offer/answer, for
      both senders and receivers during session setup. The transport
      protocol used for the delivery of these messages should also be
      specified as of session setup.
   
   7.3 Relation with media session
   
      The delivery channel of the codec control messages must be
      associated with the media session it controls. Using one codec
      control channel per media session and associating the two channels
      during session setup could achieve this purpose. Alternatively one
      media control channel could be used for multiple media sessions.
   
   
   basso                   Expires  January 2004                [Page 7]


                         Codec Control Requirements             July 2003
   
   
      In this case the controlled media session MUST be identified in
      each codec control message.
   
      The transport channel of the codec control messages should follow
      a similar path to that of the media session it controls.
      Inter-operability with other standards for codec control delivery
      might cause a deviation from this requirement.
   
   7.4  Relation with signaling
   
      The codec control transport protocol MUST be independent of the
      signaling protocol used to setup the media.
   
   7.5 Bidirectional transport
   
      Messages can be originated from receivers as well as a senders
      thus the transport mechanism must allow bi-directional exchange of
      messages.
   
   7.6 Extensibility
   
      Codec control message syntax should be extensible to easily
      support the addition of new control messages.
   
   7.7 Unicast and Multicast Support
   
      The codec control transport MUST work and scale for media sessions
      that use point-to-point unicast.
   
      The codec control transport MUST work and scale for media sessions
      that use SSM (Source Specific Multicast) and has a small to
      moderate group size.
   
      The codec control transport will not address ASM (Any Source
      Multicast) media sessions in which media sources are not known
      until they start transmission.
   
   7.8 Interoperability with other protocols
   
      The codec control transport protocol MUST allow inter-operability
      with the most commonly deployed IP-based video communication
      protocols, such as H.323, H.324 and H.324M.
   
   7.9 Timely delivery
   
      For some video services the ability to transmit codec control
      commands in a timely fashion is essential to the delivery of a
      high quality user experience. The delay introduced by transport
   
   
   
   basso                   Expires  January 2004                [Page 8]


                         Codec Control Requirements             July 2003
   
   
      protocol MUST be negligible with respect of the time constants of
      the delivered media stream.
   
   8. Security Considerations
   
      <TODO>
   
   9. Acknowledgments
   
      The authors would like to acknowledge the comments from around the
      community in helping refine this document. Particular recognition
      goes to Roni Evans.
   
   
   References
   
   
      1  Bradner, S., "The Internet Standards Process -- Revision 3",
         BCP 9, RFC 2026, October 1996.
   
      2  Bradner, S., "Key words for use in RFCs to Indicate Requirement
         Levels", BCP 14, RFC 2119, March 1997
   
      3  ITU-T Recommendation H.261 (1993), Video codec for audiovisual
         services at p . 64 kbit/s.
   
      4 ITU-T Recommendation H.263 (1998), Video coding for low bit rate
         communication.
   
      5 ISO/IEC 14496-2:2001/Amd.1:2002, "Information technology -
              Coding of audio-visual objects - Part2: Visual", 2001.
      6 Joint Video Team of ITU-T and ISO/IEC JTC 1, ôDraft ITU-T
         Recommendation and Final Draft International Standard of Joint
         Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC),ö
         Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVT-
         G050, March 2003.
   
   
      7 J. Ott et al.,  Extended RTP Profile for RTCP-based Feedback
         (RTP/AVPF), draft-ietf-avt-rtcp-feedback-04.txt, June 2002,
         IETF Draft. Work in progress.
   
      8 T. Turletti and C. Huitema, "RTP Payload Format for H.261 Video
         Streams, RFC 2032, October 1996.
   
      9 H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP û
         A Transport Protocol for Real-time Applications", Internet
   
   
   
   
   basso                   Expires  January 2004                [Page 9]


                         Codec Control Requirements             July 2003
   
   
         Draft, draft-ietf-avt-rtp-new-11.txt, Work in Progress,
         November 2001.
   
   
   
   
   Author's Addresses
   
      Andrea Basso
      NMS Communications
      200 Shultz Drive
      Red Bank, NJ 07701
      Phone: (732) 936-2118
      Email: andrea_basso@nmss.com
   
      Orit Levin
      RADVISION
      266 Harristown Road
      Glen Rock, NJ USA
      Phone:  +1-201-689-6330
      Email:  orit@radvision.com
   
      Nermeen Ismail
      Cisco Systems, Inc.
      170 West Tasman Drive
      San Jose, CA 95134-1706, USA
      Phone: +1 408 853 8714
      Email: nismail@cisco.com
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   basso                   Expires  January 2004               [Page 10]