Skip to main content

Liaison statement
Response to Liaison Statement SC 29 N 6689

State Posted
Submitted Date 2005-11-13
From Group IETF
From Contact Stephen L. Casner
To Group ISO-IEC-JTC1-SC29-WG11
To Contacts Yukiko Ogura <>
Response Contact Stephen Casner <>
Technical Contact Joerg Ott <>
Colin Perkins <>
Purpose In response
Attachments (None)
In response to your Liaison Statement ISO/IEC/JTC1/SC29/WG11/N7181
dated April 2005 (Busan):

The IETF MMUSIC and AVT working groups were introduced to the ideas
presented in this liaison statement at the 62nd IETF meeting (March
2005, Minneapolis). The working group chairs and several individuals
reviewed the internet-drafts relating to MPEG-21 DIA (draft-guenkova-
mmusic-mpeg21-sdpng-00.txt and draft-feiten-avt-bsacmode-for-rfc3640-
00.txt) in depth, and Ingo Wolf presented the ideas behind the drafts
to the two working groups.  We thank you very much this comprehensive
introduction.  A full and detailed review of the entire document you
attached to your liaison statement was unfortunately not possible so
we base our response on the input we received to our meetings and the
open IETF process in general.

Regarding the applicability to Internet technologies as developed by
the IETF, the clear consensus in both the MMUSIC and AVT working
groups was that there were numerous issues with the approach suggested
in the aforementioned drafts.  The proposed approach is not compatible
with relevant Internet technologies as they have been practiced for
many years. For the details, we refer you to the minutes of the MMUSIC
and AVT working groups at the 62nd meeting, which we attach for your

Three aspects are to be noted in particular:

1.  The coverage of MPEG-21 overlaps with that of the MIME media types
    namespace, but does not provide the same level of expressiveness
    (i.e.  it is more restricted with respect to naming of
    non-audio/visual media types). At the same time, the proposals
    allow numerous media types for which no transport is defined.

2.  Identifying a media type itself is insufficient, since the
    transport, packetization, and associated parameters need to be

3.  Use of large in-band media item adaptation information is not
    preferred, for reasons of backwards compatibility, and to avoid
    requiring complex adaptation points within the network.

While the ideas presented are interesting, the issues identified make
it difficult to adopt them within the IETF framework. We would
encourage you to align the MPEG-21 work in overlapping areas with the
current Internet practice for MIME media types and RTP payload
formats, and would recommend that you pursue the non-conflicting
(because non-overlapping) aspects in a fashion that unambiguously
delineates them from ongoing IETF work. We would encourage particular
consideration be given to media types that do not yet overlap with
IETF technologies, but which may move towards applicability in the
Internet context. We welcome further discussion of this subject in the
form of an Internet-Draft that identifies which of the media types you
envision to fall into this category, and suggests how to integrate
these types properly with the MIME registry. This should be done by
draft MIME registration where applicable, subject to IETF review.

Finally, we would like to note that, at this point in time, those
parts of the IETF community working on real-time multimedia
communications have their primary focus on SDP (RFC2327) and
extensions to it as means to express capabilities and perform
negotiation among peers.  SDPng is clearly an experimental technology
with, naturally, uncertain future at this stage (as documented in the
MMUSIC charter) and, at this point, does not have the necessary
support for moving towards a standards-track technology.

Excerpt from MMUSIC working group minutes for the 62nd IETF

   Harmonisation between SDPng and MPEG-21


   Ingo Wolf presented further on a proposed harmonization between
   SDPng and MPEG-21: He claimed that SDPng was transport-oriented
   while MPEG-21 DIA was technology-independent and focused on (media)
   presentation. He suggested an integration model in which all the
   SDPng containers are reused. The <cap>, <def>, <cfg> elements
   largely remain the same, only some adjustments are proposed to the
   RTP package. The <constraints> element is augments to cover Usage
   Environment Descriptions (UED) and should be used to address
   constraints all levels of abstraction.  Important extensions
   include the reference to external definitions using the href
   attribute, the application of MPEG-7 media stream and codec
   specifications, and some revisions to name-value pair based
   mechanism for cross referencing definitions within an SDPng
   document. Ingo walked through a detailed example (see slides for
   the details).

   Steve Casner noted that the namespace for MPEG-21 has a huge
   overlap to the one defined in RFC3551. He further pointed out that
   to the extent that MPEG-21 identifies how transport is to be done,
   the RTP names identify actual payload formats. One cannot simply
   skip that identification as this is needed in addition to the codec
   identification.  Brian Rosen agreed that one needs to explicitly
   specify the mapping.  Colin noted that the key advantage of SIP/
   SDP is that peers can negotiate arbitrary MIME types. He questions
   the benefit from the proposal which is limited to a subset of the
   codecs that are supported through the MIME type mechanism and adds
   that the proposal does not support application/*, text/*,
   etc. Magnus Westerlund pointed out that the proposal allows to
   describe streams that cannot be transported, and Dirk Kutscher
   uttered similar concerns to Colin and Magnus.

   In summary, it is questionable how the approach presented relates
   properly to the work in MMUSIC, AVT, and other groups in the IETF
   at large.

------------------------------------------------------------------------ ---
Excerpt from AVT working group minutes for the 62nd IETF

   New mode for RFC 3640: AAC-BSAC with MPEG-21 gBSD


   Ingo Wolf presented a proposal to create a new mode of operation
   for the RFC 3640 generic MPEG-4 payload format. The mode is for
   carrying MPEG-21 digital item adaptation information, which is
   written in XML for the AAC-BSAC audio codec.

   Steve Casner questioned the need for including this information
   in-band, wondering if it is possible to convey the data out-of-band
   to avoid the overhead due to the XML representation. Ingo replied
   that the data might theoretically be different for each packet, so
   it is desirable to convey it in-band, and commented that the XML
   data can be compressed to give an overhead of only 10-15% compared
   to the BSAC frame size. Colin Perkins asked if the data could be
   derived from the payload, rather than being conveyed as a separate
   description? This is possible, but would require the adaptation
   point to parse the payload, which is a complex operation that may
   not be feasible on a large scale.

   Stephan Wenger suggest an alternative design that uses two
   synchronised RTP sessions, one for the media and one for the
   adaptation information.  This would allow participants to choose
   whether they wish to receive the adaptation information as part of
   an end-to-end session setup, without the need for a middlebox to
   act as an adaptation point. There are some discussion regarding the
   appropriateness of middlebox based adaptation solutions, with Colin
   Perkins noting that RTP does support this concept (via the RTP
   translator abstraction) provided the presence of the middle box is

   Colin Perkins asked what is the reason to have this as an named
   mode of RFC 3640, rather than using the generic mode? Ingo
   responded that is to indicate the presences of the auxiliary data
   field and its content in accordance with RFC 3640 recommendations.

   Magnus Westerlund asked how likely that there is to see other
   proposals for carrying this generic scalability information for
   another media.  Don't really know, but the description format is
   very generic and could be used for any other scalable
   codec. Stephan Wenger and Colin Perkins commented that from an
   architectural point of view it would make more sense to have this
   information in a separate stream, allowing it to be used with any
   stream rather than defining a point solution.

   In conclusion, there is some interest but also much concern about
   this work. The way forward would be to consider the bigger picture,
   working on requirements to decide if it is worth generalizing the
   solution. And it is important to not forget how security functions
   will fit such an architecture.