RTP Payload Format for MPEG-4 Audio/Visual Streams
Network Working Group                                         Y. Kikuchi
Request for Comments: 3016                                       Toshiba
Category: Standards Track                                      T. Nomura
                                                             S. Fukunaga
                                                               Y. Matsui
                                                               H. Kimata
                                                           November 2000

           RTP Payload Format for MPEG-4 Audio/Visual Streams

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2000).  All Rights Reserved.


   This document describes Real-Time Transport Protocol (RTP) payload
   formats for carrying each of MPEG-4 Audio and MPEG-4 Visual
   bitstreams without using MPEG-4 Systems.  For the purpose of directly
   mapping MPEG-4 Audio/Visual bitstreams onto RTP packets, it provides
   specifications for the use of RTP header fields and also specifies
   fragmentation rules.  It also provides specifications for
   Multipurpose Internet Mail Extensions (MIME) type registrations and
   the use of Session Description Protocol (SDP).

1. Introduction

   The RTP payload formats described in this document specify how MPEG-4
   Audio [3][5] and MPEG-4 Visual streams [2][4] are to be fragmented
   and mapped directly onto RTP packets.

   These RTP payload formats enable transport of MPEG-4 Audio/Visual
   streams without using the synchronization and stream management
   functionality of MPEG-4 Systems [6].  Such RTP payload formats will
   be used in systems that have intrinsic stream management

   functionality and thus require no such functionality from MPEG-4
   Systems.  H.323 terminals are an example of such systems, where
   MPEG-4 Audio/Visual streams are not managed by MPEG-4 Systems Object
   Descriptors but by H.245.  The streams are directly mapped onto RTP
   packets without using MPEG-4 Systems Sync Layer.  Other examples are
   SIP and RTSP where MIME and SDP are used.  MIME types and SDP usages
   of the RTP payload formats described in this document are defined to
   directly specify the attribute of Audio/Visual streams (e.g., media
   type, packetization format and codec configuration) without using
   MPEG-4 Systems.  The obvious benefit is that these MPEG-4
   Audio/Visual RTP payload formats can be handled in an unified way
   together with those formats defined for non-MPEG-4 codecs.  The
   disadvantage is that interoperability with environments using MPEG-4
   Systems may be difficult, other payload formats may be better suited
   to those applications.

   The semantics of RTP headers in such cases need to be clearly
   defined, including the association with MPEG-4 Audio/Visual data
   elements.  In addition, it is beneficial to define the fragmentation
   rules of RTP packets for MPEG-4 Video streams so as to enhance error
   resiliency by utilizing the error resilience tools provided inside
   the MPEG-4 Video stream.

1.1 MPEG-4 Visual RTP payload format

   MPEG-4 Visual is a visual coding standard with many new features:
   high coding efficiency; high error resiliency; multiple, arbitrary
   shape object-based coding; etc. [2].  It covers a wide range of
   bitrates from scores of Kbps to several Mbps.  It also covers a wide
   variety of networks, ranging from those guaranteed to be almost
   error-free to mobile networks with high error rates.

   With respect to the fragmentation rules for an MPEG-4 Visual
   bitstream defined in this document, since MPEG-4 Visual is used for a
   wide variety of networks, it is desirable not to apply too much
   restriction on fragmentation, and a fragmentation rule such as "a
   single video packet shall always be mapped on a single RTP packet"
   may be inappropriate.  On the other hand, careless, media unaware
   fragmentation may cause degradation in error resiliency and bandwidth
   efficiency.  The fragmentation rules described in this document are
