Network Working Group                                         Y.-K. Wang
INTERNET-DRAFT                                                     Nokia
Intended status: Standards Track                              T. Schierl
Expires: May 11, 2007                                     Fraunhofer HHI
                                                       November 12, 2007


                    RTP Payload Format for MVC Video
                      draft-wang-avt-rtp-mvc-00.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on May 11, 2008.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Internet-Draft      RTP Payload Format for MVC Video    November 2007

Abstract
   This memo describes an RTP payload format for the multiview extension
   of the ITU-T Recommendation H.264 video codec which is technically
   identical to ISO/IEC International Standard 14496-10.  The RTP
   payload format allows for packetization of one or more Network
   Abstraction Layer (NAL) units, produced by the video encoder, in each
   RTP payload.  The payload format has wide applicability, such as 3D
   video streaming, free-viewpoint video, and 3DTV.

Table of Contents

   1.     Introduction ............................................... 4
   2.     Conventions ................................................ 4
   3.     The MVC Codec .............................................. 4
   3.1.   Overview ................................................... 4
   3.2.   Parameter Set Concept ...................................... 5
   3.3.   Network Abstraction Layer Unit Header ...................... 6
   4.     Scope ...................................................... 8
   5.     Definitions and Abbreviations .............................. 9
   5.1.   Definitions ................................................ 9
   5.1.1. Definitions per MVC specification .......................... 9
   5.1.2. Definitions local to this memo ............................. 9
   5.2.   Abbreviations ............................................. 10
   6.     MVC RTP Payload Format .................................... 10
   6.1.   Design Principles ......................................... 10
   6.2.   RTP Header Usage .......................................... 11
   6.3.   Common Structure of the RTP Payload Format ................ 11
   6.4.   NAL Unit Header Usage ..................................... 11
   6.5.   Packetization Modes ....................................... 13
   6.6.   Decoding Order Number (DON) ............................... 13
   6.7.   Aggregation Packets ....................................... 13
   6.8.   Fragmentation Units (FUs) ................................. 14
   6.9.   Payload Content Scalability Information (PACSI) NAL Unit .. 14
   7.     Packetization Rules ....................................... 18
   8.     De-Packetization Process (Informative) .................... 19
   9.     Payload Format Parameters ................................. 20
   9.1.   Media Type Registration ................................... 20
   9.2.   SDP Parameters ............................................ 21
   9.2.1. Mapping of Payload Type Parameters to SDP ................. 21
   9.2.2. Usage with the SDP Offer/Answer Model ..................... 22

Wang, Schierl          Expires May 11, 2008              [page 2]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

   9.2.3. Usage with Session Multiplexing ........................... 22
   9.2.4. Usage in Declarative Session Descriptions ................. 22
   9.3.   Examples .................................................. 22
   9.4.   Parameter Set Considerations .............................. 22
   10.    Security Considerations ................................... 22
   11.    Congestion Control ........................................ 22
   12.    IANA Considerations ....................................... 22
   13.    References ................................................ 23
   13.1.  Normative References ...................................... 23
   13.2.  Informative References .................................... 24
   14.    Author's Addresses ........................................ 24
   15.    Intellectual Property Statement ........................... 25
   16.    Disclaimer of Validity .................................... 25
   17.    Copyright Statement ....................................... 25
   18.    Acknowledgment ............................................ 26
































Wang, Schierl          Expires May 11, 2008              [page 3]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

1.   Introduction

  This memo specifies an RTP [RFC3550] payload format for a forthcoming
  new mode of the H.264/AVC video coding standard, known as Multiview
  Video Coding (MVC).  Formally, MVC takes the form of Amendment 4 to
  ISO/IEC 14496 Part 10 [MPEG4-10], and Annex H of ITU-T Rec. H.264
  [H.264]. The latest draft specification of MVC is available in [MVC].

  MVC covers a wide range of 3D video applications, including 3D video
  streaming, free-viewpoint video as well as 3DTV.

  This memo tries to follow a backward compatible enhancement
  philosophy similar to what the video coding standardization
  committees implement, by keeping as close an alignment to the
  H.264/AVC payload format [RFC3984] as possible.  It documents the
  enhancements relevant from an RTP transport viewpoint, and defines
  signaling support for MVC, including a new media subtype name.

  Due to the similarity between MVC and SVC in system and transport
  aspects, this memo reuses the design principles as well as many
  features of the SVC RTP payload draft [I-D.draft-ietf-avt-svc].  The
  feasibility of specifying this memo as a delta in relative to the SVC
  RTP payload draft could be studied in future versions.

2.    Conventions

  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
  document are to be interpreted as described in BCP 14, RFC 2119
  [RFC2119]

  This specification uses the notion of setting and clearing a bit when
  bit fields are handled.  Setting a bit is the same as assigning that
  bit the value of 1 (On).  Clearing a bit is the same as assigning
  that bit the value of 0 (Off).

3.    The MVC Codec

3.1.     Overview


Wang, Schierl          Expires May 11, 2008              [page 4]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  MVC provides multi-view video bitstreams.  An MVC bitstream contains
  a base view conforming to at least one of the profiles of H.264/AVC
  as defined in Annex A of [H.264], and one or more non-base views.  To
  enable high compression efficiency, coding of a non-base view can
  utilize other views for inter-view prediction, thus its decoding
  relies on the presence of the views it depends on.  Each coded view
  itself may be temporally scalable, similar as a scalable video coding
  (SVC) [SVC] bitstream.  Besides temporal scalability, MVC also
  supports view scalability, wherein a subset of the encoded views can
  be extracted, decoded and displayed, whenever it is desired by the
  application.

  The concept of video coding layer (VCL) and network abstraction layer
  (NAL) is inherited from H.264/AVC.  The VCL contains the signal
  processing functionality of the codec; mechanisms such as transform,
  quantization, motion-compensated prediction, loop filtering and
  inter-layer prediction.  The Network Abstraction Layer (NAL)
  encapsulates each slice generated by the VCL into one or more Network
  Abstraction Layer Units (NAL units).  Please consult RFC 3984 for a
  more in-depth discussion of the NAL unit concept.  MVC specifies the
  decoding order of NAL units.

  In MVC, one access unit contains all NAL units pertaining to one
  output time instance for all the views.  Within one access unit, each
  view representation consists of one or more slices.

  The concept of temporal scalability is not newly introduced by SVC or
  MVC, as profiles defined in Annex A of [H.264] already support it.
  In [H.264], sub-sequences have been introduced in order to allow
  optional use of temporal layers.  SVC extended this approach by
  advertising the temporal scalability information within the NAL unit
  header or prefix NAL units, both were inherited to MVC.

3.2.     Parameter Set Concept

  The parameter set concept was first specified in [H.264].  Please
  refer to section 1.2 of [RFC3984] for more details.  SVC introduced
  some new parameter set mechanisms, as specified in [SVC].  It is
  expected that MVC will inherit the parameter set concept from both
  [H.264] and [SVC].

Wang, Schierl          Expires May 11, 2008              [page 5]


Internet-Draft      RTP Payload Format for MVC Video    November 2007


  In particular, a different type of sequence parameter set (SPS) using
  a different NAL unit type than "the old SPS" specified in [H.264]
  would be used for non-base views, while the base view would still use
  "the old SPS".  Slices from different views would be able to use
  either 1) the same sequence or picture parameter set, or 2) different
  sequence or picture parameter sets.

  The inter-view dependency as well as the decoding order of all the
  encoded views are indicated in a new syntax structure, the SPS MVC
  extension, included in SPS.

3.3.     Network Abstraction Layer Unit Header

  An MVC NAL unit (of type 20 or 14) consists of a header of four
  octets and the payload byte string.  MVC NAL units of type 20 are
  coded slices of non-base views.  A special type of an MVC NAL unit is
  the prefix NAL unit (type 14) that includes descriptive information
  of the associated H.264/AVC VCL NAL unit (type 1 or 5) that
  immediately follows the prefix NAL unit.
  MVC extends the one-byte H.264/AVC NAL unit header by three
  additional octets.  The header indicates the type of the NAL unit,
  the (potential) presence of bit errors or syntax violations in the
  NAL unit payload, information regarding the relative importance of
  the NAL unit for the decoding process, the view identification
  information, the temporal layer identification information, and other
  fields as discussed below.
  The syntax and semantics of the NAL unit header are formally
  specified in [MVC], but the essential properties of the NAL unit
  header are summarized below.

  The first byte of the NAL unit header has the following format (the
  bit fields are the same as defined for the one-byte H.264/AVC NAL
  unit header, while the semantics of some fields have changed
  slightly, in a backward compatible way):

        +---------------+
        |0|1|2|3|4|5|6|7|
        +-+-+-+-+-+-+-+-+
        |F|NRI|  Type   |

Wang, Schierl          Expires May 11, 2008              [page 6]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

        +---------------+

  F: 1 bit
  forbidden_zero_bit.  H.264/AVC declares a value of 1 as a syntax
  violation.

  NRI: 2 bits
  nal_ref_idc.  A value of 00 indicates that the content of the NAL
  unit is not used to reconstruct reference pictures for future
  prediction.  Such NAL units can be discarded without risking the
  integrity of the reference pictures in the same view.  A value
  greater than 00 indicates that the decoding of the NAL unit is
  required to maintain the integrity of reference pictures in the same
  view, or that the NAL unit contains parameter sets.

  Type: 5 bits
  nal_unit_type.  This component specifies the NAL unit type.
  In H.264/AVC, NAL unit types 14 and 20 are reserved for future
  extensions.  MVC uses these two NAL unit types.  NAL unit type 14 is
  used for prefix NAL unit, NAL unit type 20 is used for coded slice of
  non-base view.  NAL unit types 14 and 20 indicate the presence of
  three additional octets in the NAL unit header, as shown below.

           +---------------+---------------+---------------+
           |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
           +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
           |S|   PRID    | TID |A|      VID          |I|V|R|
           +---------------+---------------+---------------+

  S: 1 bit
  svc_mvc_flag.  This flag specifies whether the NAL unit is an SVC NAL
  unit (when equal to 0) or an MVC NAL unit (when equal to 1).

  PRID: 6 bits
  priority_id.  This flag specifies a priority identifier for the NAL
  unit.  A lower value of PRID indicates a higher priority.

  TID: 3 bits
  temporal_id.  This component specifies the temporal layer (or frame
  rate) hierarchy.  Informally put, a layer consisting of view

Wang, Schierl          Expires May 11, 2008              [page 7]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  representations with a less temporal_id corresponds to a lower frame
  rate.  A given temporal layer typically depends on the lower temporal
  layers (i.e. the temporal layers with less temporal_id) but never
  depends on any higher temporal layer.

  A: 1 bit
  anchor_pic_flag.  This component specifies whether the view
  representation is an anchor picture (when equal to 1) or not (when
  equal to 0), as specified in [MVC].

  VID: 10 bits
  view_id.  This component specifies the view identifier of the view.

  I: 1 bit
  idr_flag.  This component specifies whether the view representation
  is a view instantaneous decoding refresh (V-IDR) picture for the view
  (when equal to 1) or not (when equal to 0), as specified in [MVC].

  V: 1 bit
  inter_view_flag.  This component specifies whether the view
  representation is used for inter-view prediction (when equal to 1) or
  not (when equal to 0).

  R: 1 bit
  reserved_zero_one_bit.  Reserved bit for future extension.  R MUST be
  equal to 0.  Receivers SHOULD discard NAL units with R equal to 1.

  This memo reuses the same additional NAL unit types introduced in RFC
  3984, which are presented in section 6.3.  In addition, this memo
  introduces one more NAL unit type, 30, as specified in section 6.9.
  These NAL unit types are marked as unspecified in [MVC] and
  intentionally reserved for use in systems specifications like this
  memo.  Moreover, this specification extends the semantics of F, NRI,
  PRID, TID, A, and I as described in section 6.4.

   4. Scope

  This payload specification can only be used to carry the "naked" NAL
  unit stream over RTP, and not the byte stream format according to
  Annex B of [MVC].  Likely, the applications of this specification

Wang, Schierl          Expires May 11, 2008              [page 8]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  will be in the IP based multimedia communications fields including 3D
  video streaming over IP, free-viewpoint video over IP, and 3DTV over
  IP.

  This specification allows, in a given RTP session, to encapsulate NAL
  units belonging to

    o the base view only, detailed specification in [RFC3984], or
    o one or more non-base views, or
    o the base view and one or non-base views

   5. Definitions and Abbreviations

5.1.     Definitions

5.1.1.    Definitions per MVC specification

  This document uses the definitions of [MVC].  The following terms,
  defined in [SVC], are summed up for convenience:

  access unit:  A set of NAL units pertaining to a certain temporal
  location.  An access unit includes the coded slices of all the views
  at that temporal location and possibly other associated data, e.g.
  supplemental enhancement information (SEI) messages and parameter
  sets.

  prefix NAL unit:  A NAL unit with nal_unit_type equal to 14 that
  immediately precedes a NAL unit with nal_unit_type equal to 1, 5,
  or 12.  The NAL unit that succeeds the prefix NAL unit is also
  referred to as the associated NAL unit.  The prefix NAL unit contains
  data associated with the associated NAL unit, which are considered to
  be part of the associated NAL unit.

5.1.2.    Definitions local to this memo

  MVC NAL unit:  A NAL unit of NAL unit type 14 or 20 as specified in
  Annex H of [MVC]. An SVC NAL unit has a four-byte NAL unit header.

  operation point:  An operation point of an MVC bitstream represents a
  certain level of temporal and view scalability.  An operation point

Wang, Schierl          Expires May 11, 2008              [page 9]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  contains only those NAL units required for a valid bitstream to
  represent a certain subset of views at a certain temporal level.  An
  operation point is described by the view_id values of the subset of
  views, and the highest temporal_id.

  temporal scalable layer switching point:  A view representation for
  which itself and all subsequent view representations with the same
  value of temporal_id and view_id in decoding order do not refer to
  any preceding view representation with the same value of temporal_id
  and view_id in decoding order for inter prediction.  Such a view
  representation can be used to switching from the next lower temporal
  layer to the current temporal layer when operating at the same
  view_id.

   Session multiplexing:  The views of the MVC bitstream are distributed
   onto different RTP sessions, whereby each RTP session carries a
   single RTP packet stream.  Each RTP session requires a separate
   signaling and has a separate Timestamp, Sequence Number, and SSRC
   space.  Dependency between sessions MUST be signaled according to [I-
   D.draft-ietf-mmusic-decoding-dependency] and this memo.


5.2.     Abbreviations

  In addition to the abbreviations defined in [RFC3984], the following
  ones are defined.

  MVC:       Multiview Video Coding
  CL-DON:    Cross-Layer Decoding Order Number
  PACSI:     Payload Content Scalability Information

6.    MVC RTP Payload Format

6.1.     Design Principles

  The following design principles have been observed:

  o Backward compatibility with [RFC3984] wherever possible.



Wang, Schierl          Expires May 11, 2008              [page 10]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  o As the MVC base view is H.264/AVC compatible, the base view or any
  subset, when transmitted in its own session, MUST be encapsulated
  using [RFC3984] and dependency between sessions MUST be signaled
  according to [I-D.draft-ietf-mmusic-decoding-dependency].  Requiring
  this has the desirable side effect that it can be used by [RFC3984]
  legacy devices.

  o MANEs are signaling aware and rely on signaling information.  MANEs
  have state.

  o MANEs can aggregate multiple RTP streams, possibly from multiple
  RTP sessions.

  o MANEs can perform media-aware stream thinning.  By using the
  payload header information identifying Layers within an RTP session,
  MANEs are able to remove packets from the incoming RTP packet stream.
  This implies rewriting the RTP headers of the outgoing packet stream
  and rewriting of RTCP Receiver Reports.

6.2.     RTP Header Usage

  Please see section 5.1 of [RFC3984].

6.3.     Common Structure of the RTP Payload Format

  Please see section 5.2 of [RFC3984].

6.4.     NAL Unit Header Usage

  The structure and semantics of the NAL unit header were introduced in
  section 3.3 This section specifies the semantics of F, NRI, PRID,
  TID, A and I according to this specification.

  Note that, in the context of this section, "protecting a NAL unit"
  means any RTP or network transport mechanism that could improve the
  probability of success delivery of the packet conveying the NAL unit,
  including applying a QoS-enabled network, FEC, retransmissions, and
  advanced scheduling behavior, whenever possible.



Wang, Schierl          Expires May 11, 2008              [page 11]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  The semantics of F specified in section 5.3 of [RFC3984] also applies
  herein.

  For NRI, for a bitstream conforming to one of the profiles defined in
  Annex A of [H.264] and transported using [RFC3984], the semantics
  specified in section 5.3 of [RFC3984] are applicable, i.e., NRI also
  indicates the relative importance of NAL units.  In MVC context, in
  addition to the semantics specified in Annex H of [MVC] are
  applicable, NRI also indicate the relative importance of NAL units
  within a view.  MANEs MAY use this information to protect more
  important NAL units better than less important NAL units.

  For PRID, the semantics specified in Annex H of [MVC] applies.  Note,
  that MANEs implementing unequal error protection MAY use this
  information to protect NAL units with smaller PRID values better than
  those with larger PRID values, for example by including only the more
  important NAL units in a forward error correction (FEC) protection
  mechanism.  The importance for the decoding process decreases as the
  PRID value increases.

  For TID, in addition to the semantics specified in Annex H of [MVC],
  according to this memo, values of TID indicate the relative
  importance.  A lower value of TID indicates a higher importance for a
  certain view.  MANEs MAY use this information to protect more
  important NAL units better than less important NAL units.

  For A, in addition to the semantics specified in Annex H of [MVC],
  according to this memo, MANEs MAY use this information to protect NAL
  units with A equal to 1 better than NAL units with A equal to 0.
  MANEs MAY also utilize information of NAL units with A equal to 1 to
  decide when to forward more packets for an RTP packet stream. For
  example, when it is sensed that view switching has happened such that
  the operation point has changed, MANEs MAY start to forward NAL units
  for a new target view only after forwarding a NAL unit with A equal
  to 1 for the new target view.

  For I, in addition to the semantics specified in Annex H of [MVC],
  according to this memo, MANEs MAY use this information to protect NAL
  units with I equal to 1 better than NAL units with I equal to 0.
  MANEs MAY also utilize information of NAL units with I equal to 1 to

Wang, Schierl          Expires May 11, 2008              [page 12]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  decide when to forward more packets for an RTP packet stream. For
  example, when it is sensed that view switching has happened such that
  the operation point has changed, MANEs MAY start to forward NAL units
  for a new target view only after forwarding a NAL unit with I equal
  to 1 for the new target view.

6.5.     Packetization Modes

  Please see section 5.4 of [RFC3984].

6.6.     Decoding Order Number (DON)

  Please see section 5.5 of [RFC3984].  The following applies in
  addition.

  If different views of a SVC bitstream are transported in more than
  one RTP session and interleaved mode is used, the DON values of all
  the NAL units in the RTP sessions using interleaved mode MUST
  indicate CL-DON values.

  When different views of an SVC bitstream are transported in more than
  one RTP session and at least one STAP-A packet is present in any of
  the RTP sessions and interleaved mode is used in at least one of the
  RTP sessions, the following applies:

  o A PACSI NAL unit MUST be present in each STAP-A packet.

  o A CL-DON field MUST be present in the PACSI NAL unit included in an
  STAP-A.

  o The DON values for the NAL units in each STAP-A packet MUST be
  derived as follows and indicate CL-DON values.  The CL-DON field in
  the PACSI NAL unit specifies the value of DON for the first NAL unit
  in the STAP-A in transmission order.  For each successive NAL unit in
  appearance order in the STAP-A, the value of DON is equal to (the
  value of DON of the previous NAL unit in the STAP-A + 1) % 65536,
  wherein '%' stands for modulo operation.

6.7.     Aggregation Packets


Wang, Schierl          Expires May 11, 2008              [page 13]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  Please see section 5.7 of [RFC3984].

6.8.     Fragmentation Units (FUs)

  Please see section 5.8 of [RFC3984].

6.9.     Payload Content Scalability Information (PACSI) NAL Unit

  A new NAL unit type is specified in this memo, and referred to as
  payload content scalability information (PACSI) NAL unit.  The PACSI
  NAL unit, if present, MUST be the first NAL unit in an aggregation
  packet, and it MUST NOT be present in other types of packets.  The
  PACSI NAL unit indicates view and temporal scalability information
  and other characteristics that are common for all the remaining NAL
  units in the payload of the aggregation packet. Furthermore, a PACSI
  NAL unit MAY include a CL-DON field and contain zero or more SEI NAL
  units.  PACSI NAL unit makes it easier for MANEs to decide whether to
  forward/process/discard the aggregation packet containing the PACSI
  NAL unit.  Senders MAY create PACSI NAL units and receivers MAY
  ignore them, or use them as hints to enable efficient aggregation
  packet processing.  Note that the NAL unit type for the PACSI NAL
  unit is selected among those values that are unspecified in [MVC] and
  [RFC3984].

  When the first aggregation unit of an aggregation packet contains a
  PACSI NAL unit, there MUST be at least one additional aggregation
  unit present in the same packet.  The RTP header and payload header
  fields of the aggregation packet are set according to the remaining
  NAL units in the aggregation packet.

  When a PACSI NAL unit is included in a multi-time aggregation packet
  (MTAP), the decoding order number (DON) for the PACSI NAL unit MUST
  be set to indicate that the PACSI NAL unit has an identical DON to
  the first NAL unit in decoding order among the remaining NAL units in
  the aggregation packet.

  The structure of a PACSI NAL unit is as follows.  The first four
  octets are exactly the same as the four-byte MVC NAL unit header as
  discussed in section 3.3.  They are followed by two always present
  octet, two optional octets, and zero or more SEI NAL units, each SEI

Wang, Schierl          Expires May 11, 2008              [page 14]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  NAL unit preceded by a 16-bit unsigned size field (in network byte
  order) that indicates the size of the following NAL unit in bytes
  (excluding these two octets, but including the NAL unit type octet of
  the SEI NAL unit).  Figure 1 illustrates the PACSI NAL unit structure
  and an example of a PACSI NAL unit containing two SEI NAL units.

  The bits T, P, C, S, and E are specified only if the bit X is equal
  to 1.  The field CL-DON MUST NOT be present if the aggregation packet
  containing the PACSI NAL unit is not an STAP-A packet.  The field CL-
  DON MAY be present if the aggregation packet containing the PACSI NAL
  unit is an STAP-A packet.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |F|NRI|  Type   |S|   PRID    | TID |A|      VID          |I|V|R|
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |X|RR |T|P|C|S|E|    RRR        |        CL-DON (optional)      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |        NAL unit size 1        |                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         SEI NAL unit 1        |
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |        NAL unit size 2        |                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   SEI NAL unit 2              |
     |                                                               |
     |                                               +-+-+-+-+-+-+-+-+
     |                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Figure 1.  PACSI NAL unit structure

  The values of the fields in PACSI NAL unit MUST be set as follows.

  o The F bit MUST be set to 1 if the F bit in at least one of the
  remaining NAL units in the payload is equal to 1.  Otherwise, the F
  bit MUST be set to 0.

  o The NRI field MUST be set to the highest value of NRI field among
  all the remaining NAL units in the payload.

Wang, Schierl          Expires May 11, 2008              [page 15]


Internet-Draft      RTP Payload Format for MVC Video    November 2007


  o The Type field MUST be set to 30.

  o The S bit MUST be set to 1.

  o The PRID field MUST be set to the lowest value of the PRID values
  of all the remaining NAL units in the payload.

  o The TID field MUST be set to the lowest value of the TID values of
  all the remaining NAL units with the lowest value of DID in the
  payload.

  o The A bit MUST be set to 1 if the A bit of at least one of the
  remaining NAL units in the payload is equal to 1.  Otherwise, the A
  bit MUST be set to 0.

  o The V bit MUST be set to 1 if the V bit of at least one of the
  remaining NAL units in the payload is equal to 1.  Otherwise, the A
  bit MUST be set to 0.

  o The I bit MUST be set to 1 if the I bit of at least one of the
  remaining NAL units in the payload is equal to 1.  Otherwise, the I
  bit MUST be set to 0.

  o The R bit MUST be set to 0.

  o If the X bit is equal to 1, the bits T, P, C, S, and E are
  specified as in below. Otherwise, the bits T, P, C, S, and E are
  unspecified, and receivers MUST ignore these bits.  The X bit SHOULD
  be identical for all the PACSI NAL units involved in all the RTP
  sessions conveying an SVC bitstream.

  o The RR field MUST be set to '00'.

  o The T bit MUST be set to 1 if all the target NAL units (as defined
  above) belong to temporal scalable layer switching points.
  Otherwise, the T bit MUST be set to 0.  The T bit SHOULD be identical
  for all the PACSI NAL units for which the target NAL units belong to
  the same access unit.


Wang, Schierl          Expires May 11, 2008              [page 16]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  o The P bit MUST be set to 1 if all the target NAL units (as defined
  above) are with redundant_pic_cnt greater than 0, i.e. the slices are
  redundant slices.  Otherwise, the P bit MUST be set to 0.  The P bit
  SHOULD be identical for all the PACSI NAL units for which the target
  NAL units belong to the same access unit.

  o The C bit MUST be set to 1 if the target NAL units (as defined
  above) belong to an access unit for which the view representations
  are intra view representations.  Otherwise, the C bit MUST be set to
  0.  The C bit SHOULD be identical for all the PACSI NAL units for
  which the target NAL units belong to the same access unit.

  o The S bit MUST be set to 1, if the first VCL NAL unit, in decoding
  order, of the view representation containing the first NAL unit
  following the PACSI NAL unit in the aggregation packet is present in
  the payload.  Otherwise, the S bit MUST be set to 0.

  o The E bit MUST be set to 1, if the last VCL NAL unit, in decoding
  order, of the view representation containing the first NAL unit
  following the PACSI NAL unit in the aggregation packet is present in
  the payload.  Otherwise, the E field MUST be set to 0.

  o The RRR field MUST be set to '00000000'.

  o When present, the field CL-DON indicates the cross-layer decoding
  order number for the first NAL unit in the STAP-A in transmission
  order.

  SEI NAL units included in the PACSI NAL unit, if any, MUST contain a
  subset of the SEI messages associated with the access unit of the
  first NAL unit following the PACSI NAL unit within the aggregation
  packet.

  Informative note: Senders may repeat such SEI NAL units in the PACSI
  NAL unit the presence of which in more than one packet is essential
  for packet loss robustness.  Receivers may use the repeated SEI
  messages in place of missing SEI messages.




Wang, Schierl          Expires May 11, 2008              [page 17]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  An SEI message SHOULD NOT be included in a PACSI NAL unit and
  included in one of the remaining NAL units contained in the same
  aggregation packet at the same time.

7.    Packetization Rules

  Section 6 of [RFC3984] applies.  The following rules apply in
  addition.
  All receivers MUST support the single NAL unit packetization mode to
  provide backward compatibility to endpoints supporting only the
  single NAL unit mode of RFC 3984. However, the single NAL unit
  packetization mode SHOULD NOT be used whenever possible, because
  encapsulating NAL units of small sizes, e.g. small NAL units
  containing parameter sets or SEI messages, in their own packets is
  typically less efficient because of the relatively big overhead.

  All receivers MUST support the non-interleaved packetization mode.

  Informative note: The non-interleaved mode allows an application to
  encapsulate a single NAL unit in a single RTP packet.  Historically,
  the single NAL unit mode has been included into [RFC3984] only for
  compatibility with ITU-T Rec. H.241 Annex A [H.241].  There is no
  point in carrying this historic ballast towards a new application
  space such as the one provided with MVC.  More technically speaking,
  the implementation complexity increase for providing the additional
  mechanisms of the non-interleaved mode (namely STAP-A) is so minor,
  and the benefits are so great, that STAP-A implementation is
  required.

  A NAL unit of small size SHOULD be encapsulated in an aggregation
  packet together with one or more other NAL units. For example, non-
  VCL NAL units such as access unit delimiter, parameter set, or SEI
  NAL unit are typically small.

  A prefix NAL unit SHOULD be aggregated to the same packet as the
  associated NAL unit following the prefix NAL unit in decoding order.

  When the first aggregation unit of an aggregation packet contains a
  PACSI NAL unit, there MUST be at least one additional aggregation
  unit present in the same packet.

Wang, Schierl          Expires May 11, 2008              [page 18]


Internet-Draft      RTP Payload Format for MVC Video    November 2007


  When an MVC bitstream is transported in more than one RTP session,
  the following applies.

  o Interleaved mode SHOULD be used for all the RTP sessions.

  o An RTP session that does not use interleaved mode SHOULD be
  constrained as follows.

    - Non-interleaved mode MUST be used.
    - STAP-A MUST be used, and any other type of packets MUST NOT be
  used.
    - Each STAP-A MUST contain a PACSI NAL unit and the CL-DON field
  MUST be present in the PACSI NAL unit.

  Informative note: The motivation for these constraints is to allow
  the use of non-interleaved mode for the session conveying the
  H.264/AVC compatible view, such that RFC 3984 receivers without
  interleaved mode implementation can subscribe to the base view
  session.

  Non-VCL NAL units SHOULD be conveyed in the same session as the
  associated VCL NAL units.  To meet this, SEI messages that are
  contained in scalable nesting SEI message and are applicable to more
  than one session SHOULD be separated and contained into multiple
  scalable nesting SEI messages.  The CL-DON values MUST indicate the
  cross-layer decoding order number values as if all these SEI messages
  were in separate scalable nesting SEI messages and contained in the
  beginning of the corresponding access units as specified in [MVC].

8.    De-Packetization Process (Informative)

  For a single RTP session, the de-packetization process specified in
  section 7 of [RFC3984] applies.

  For receiving more than one of multiple RTP sessions conveying a
  scalable bitstream, an example of a suitable implementation of the
  de-packetization process is to be specified.



Wang, Schierl          Expires May 11, 2008              [page 19]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

9.    Payload Format Parameters

  This section specifies the parameters that MAY be used to select
  optional features of the payload format and certain features of the
  bitstream.  The parameters are specified here as part of the media
  type registration for the MVC codec.  A mapping of the parameters
  into the Session Description Protocol (SDP) [RFC4566] is also
  provided for applications that use SDP.  Equivalent parameters could
  be defined elsewhere for use with control protocols that do not use
  SDP.

9.1.     Media Type Registration

  The media subtype for the MVC codec is allocated from the IETF tree.
  The receiver MUST ignore any unspecified parameter.
  Informative note: Requiring ignoring unspecified parameter allows for
  backward compatibility of future extensions.  For example, if a
  future specification that is backward compatible to this
  specification specifies some new parameters, then a receiver
  according to this specification is capable of receiving data per the
  new payload but ignoring those parameters newly specified in the new
  payload specification.  This sentence is also present in RFC 3984.

  Media Type name:     video

  Media subtype name:  H264-MVC or H264

  The media subtype "H264" MUST be used for RTP streams using RFC 3984,
  i.e. not using any of the new features introduced by this
  specification compared to RFC 3984.  For RTP streams using any of the
  new features introduced by this specification compared to RFC 3984,
  the media subtype "H264-MVC" SHOULD be used, and the media subtype
  "H264" MAY be used.  Use of the media subtype "H264" for RTP streams
  using the new features allows for RFC 3984 receivers to negotiate and
  receive H.264/AVC or MVC streams packetized according to this
  specification, but to ignore media parameters and NAL unit types it
  does not recognize.

  Required parameters: none


Wang, Schierl          Expires May 11, 2008              [page 20]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  OPTIONAL parameters: to be specified.

  Encoding considerations:
      This type is only defined for transfer via RTP (RFC 3550).

  Security considerations:
      See section 10 of RFC XXXX.

  Public specification:
      Please refer to RFC XXXX and its section 14.

  Additional information: none

  File extensions: none

  Macintosh file type code: none

  Object identifier or OID: none

  Person & email address to contact for further information:

  Intended usage: COMMON

  Author: NN

  Change controller:
      IETF Audio/Video Transport working group delegated from the IESG.

9.2.     SDP Parameters

9.2.1.    Mapping of Payload Type Parameters to SDP

  The media type video/H264-MVC string is mapped to fields in the
  Session Description Protocol (SDP) as follows:
  The media name in the "m=" line of SDP MUST be video.
  The encoding name in the "a=rtpmap" line of SDP MUST be H264-MVC (the
  media subtype).

  The clock rate in the "a=rtpmap" line MUST be 90000.


Wang, Schierl          Expires May 11, 2008              [page 21]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  The OPTIONAL parameters, when present, MUST be included in the
  "a=fmtp" line of SDP.  These parameters are expressed as a media type
  string, in the form of a semicolon separated list of parameter=value
  pairs.

9.2.2.    Usage with the SDP Offer/Answer Model

  TBD.

9.2.3.    Usage with Session Multiplexing

  If Session multiplexing is used, the rules on signaling media
  decoding dependency in SDP as defined in
  [I-D.draft-ietf-mmusic-decoding-dependency] apply.

9.2.4.    Usage in Declarative Session Descriptions

  TBD.

9.3.     Examples

  TBD.

9.4.     Parameter Set Considerations

  Please see section 10 of [RFC3984].

10.   Security Considerations

  Please see section 11 of [RFC3984].

11.   Congestion Control

  TBD.

12.   IANA Considerations

  Request for media type registration to be added.



Wang, Schierl          Expires May 11, 2008              [page 22]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

13.   References

13.1.    Normative References

  [H.264]    ITU-T Recommendation H.264, "Advanced video coding for
  generic audiovisual services", Version 4, July 2005.

  [I-D.draft-ietf-avt-rtp-svc] Wenger, S., Schierl, T., and Wang, Y. -
  K., "RTP payload format for SVC video", draft-ietf-avt-rtp-svc-03
  (work in progress), November 2007.

  [I-D.draft-ietf-mmusic-decoding-dependency] Schierl, T., and Wenger,
  S., "Signaling media decoding dependency in Session Description
  Protocol (SDP)", draft-ietf-mmusic-decoding-dependency-00 (work in
  progress), November 2007.

  [MPEG4-10] ISO/IEC International Standard 14496-10:2005.

  [MVC]     Joint Video Team, "Joint Draft 4 of MVC ", available from
  http://ftp3.itu.ch/av-arch/jvt-site/2007_06_Geneva/JVT-X209.zip,
  Geneva, Switzerland, June 2007.

  [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
  Requirement Levels", BCP 14, RFC 2119, March 1997.

  [RFC3548] Josefsson, S., "The Base16, Base32, and Base64 Data
  Encodings", RFC 3548, July 2003.

  [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson,
  V., "RTP: A Transport Protocol for Real-Time Applications", STD 64,
  RFC 3550, July 2003.

  [RFC3984] Wenger, S., Hannuksela, M., Stockhammer, T., Westerlund,
  M., and Singer, D., "RTP Payload Format for H.264 Video", RFC 3984,
  February 2005.

  [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session
  Description Protocol", RFC 4566, July 2006.



Wang, Schierl          Expires May 11, 2008              [page 23]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  [SVC]     Joint Video Team, "Joint Draft 11 of SVC Amendment",
  available from http://ftp3.itu.ch/av-arch/jvt-
  site/2007_06_Geneva/JVT-X201.zip, Geneva, Switzerland, June 2007.

13.2.    Informative References

  [DVB-H]   DVB - Digital Video Broadcasting (DVB); DVB-H
  Implementation Guidelines, ETSI TR 102 377, 2005.

  [H.241]   ITU-T Rec. H.241, "Extended video procedures and control
  signals for H.300-series terminals", May 2006.

  [IGMP]    Cain, B., Deering S., Kovenlas, I., Fenner, B., and
  Thyagarajan, A., "Internet Group Management Protocol, Version 3", RFC
  3376, October 2002.

  [McCanne] McCanne, S., Jacobson, V., and Vetterli, M., "Receiver-
  driven layered multicast", in Proc. of ACM SIGCOMM'96, pages 117--
  130, Stanford, CA, August 1996.

  [MBMS]    3GPP - Technical Specification Group Services and System
  Aspects; Multimedia Broadcast/Multicast Service (MBMS); Protocols and
  codecs (Release 6), December 2005.

  [MPEG2]   ISO/IEC International Standard 13818-2:1993.

  [RFC3450] Luby, M., Gemmell, J., Vicisano, L., Rizzo, L., and
  Crowcroft, J., "Asynchronous layered coding (ALC) protocol
  instantiation", RFC 3450, December 2002.

14.   Author's Addresses

  Ye-Kui Wang                    Phone: +358-50-486-7004
  Nokia Research Center          Email: ye-kui.wang@nokia.com
  P.O. Box 100
  FIN-33721 Tampere
  Finland

  Thomas Schierl                 Phone: +49-30-31002-227
  Fraunhofer HHI                 Email: schierl@hhi.fhg.de

Wang, Schierl          Expires May 11, 2008              [page 24]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  Einsteinufer 37
  D-10587 Berlin
  Germany

15.   Intellectual Property Statement

  The IETF takes no position regarding the validity or scope of any
  Intellectual Property Rights or other rights that might be claimed to
  pertain to the implementation or use of the technology described in
  this document or the extent to which any license under such rights
  might or might not be available; nor does it represent that it has
  made any independent effort to identify any such rights.  Information
  on the procedures with respect to rights in RFC documents can be
  found in BCP 78 and BCP 79.

  Copies of IPR disclosures made to the IETF Secretariat and any
  assurances of licenses to be made available, or the result of an
  attempt made to obtain a general license or permission for the use of
  such proprietary rights by implementers or users of this
  specification can be obtained from the IETF on-line IPR repository at
  http://www.ietf.org/ipr.

  The IETF invites any interested party to bring to its attention any
  copyrights, patents or patent applications, or other proprietary
  rights that may cover technology that may be required to implement
  this standard.  Please address the information to the IETF at
  ietf-ipr@ietf.org.

16.   Disclaimer of Validity

  This document and the information contained herein are provided on an
  "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
  OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
  THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
  OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
  THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
  WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

17.   Copyright Statement


Wang, Schierl          Expires May 11, 2008              [page 25]


Internet-Draft      RTP Payload Format for MVC Video    November 2007

  Copyright (C) The IETF Trust (2007).
  This document is subject to the rights, licenses and restrictions
  contained in BCP 78, and except as set forth therein, the authors
  retain all their rights.

18.   Acknowledgment

  Funding for the RFC Editor function is currently provided by the
  Internet Society.







































Wang, Schierl          Expires May 11, 2008              [page 26]