INTERNET-DRAFT                                        Katsushi Kobayashi
draft-ietf-avt-dv-video-01.txt         Communication Research Laboratory
                                                          Akimichi Ogawa
                                                         Keio University
                                                          Stephen Casner
                                                           Cisco Systems
                                                        Carsten  Bormann
                                                 Universitaet Bremen TZI
                                                        October 18, 1999
                                                     Expires March  2000

                 RTP Payload Format for DV Format Video

Status of this Memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

1. Abstract

   This document specifies the packetization scheme for encapsulating DV
   compressed digital video data streams, commonly known as "DV", into a
   payload format for the Real-Time Transport Protocol (RTP).  There are
   two kinds of DV format, one for consumer use and other for
   professional. The original "DV" specification designed for consumer
   use digital VCR is approved as IEC61834 standard sets.  The
   specifications for professional are also published as SMPTE 306M(D-7)
   and 314M(D-9), the both are based on the consumer DV.  The RTP
   payload format specified in this document supports IEC61834 consumer
   DV and professional SMPTE 306M and 314M(DV-Based) formats.

2. Introduction

Kobayashi, et al           Expires March 2000                   [Page 1]


Internet Draft                                          October 18, 1999

      DV compression video formats are designed for the recording format
   of helical-scan magnetic tape media. The DV format only uses intra-
   frame DCT compression technique, not using an interframe compression
   different from MPEG. The DV standard for consumer market device
   approved as IEC61834 series that comprises the whole specification of
   consumer use digital video including mechanical specifications of a
   cassette, helical magnetic recording format, error correction in the
   magnetic tape, DCT video encoding format, and audio encoding
   format[1]. And the digital interface specifications part splitted DV
   standard are published as IEC61883 series that defines an interface
   on IEEE1394 network[2,3].  The IEC specification set supports several
   video formats: SD-VCR (including 525/60, 625/50), HD-VCR (1125/60,
   1250/50), SDL-VCR (525/60, 625/50), PALPlus, DVB (Digital Video
   Broadcast) and ATV (Advanced Television).

   The other DV standard supposing professional use are published by
   SMPTE as 306M and 314M. Both standards are based on the IEC61834 DV
   standard and defines D-7 system (525/60, 625/50) and D-9 system
   (525/60 25Mbps, 625/50 25Mbps, 525/60 50Mbps, 625/50 50Mbps)[4,5].

   This document basically specifies the RTP payload format for
   encapsulating both consumer and professional use DV format data
   streams into the Real-time Transport Protocol (RTP), version 2 [6].
   However, IEC61834 covers magnetic tape recording for digital TV
   broadcasting system as DVB and ATV that are used in MPEG2 encoding.
   The payload format for encapsulating MPEG2 into RTP has already been
   defined in RFC 2250[7]. That RFC2250 payload format can provide more
   optimized way for transmitting MPEG2 stream over the Internet than
   would be encapsulation of MPEG2 first into the DVB or ATV and then
   into RTP. Therefore, the packetization way of DV formats based on
   MPEG2 is outside the scope of this document.

   Consequently, the payload format specified in this document will
   support the six video formats of the IEC standard: SD-VCR (525/60,
   625/50), HD-VCR (1125/60, 1250/50) and SDL-VCR (525/60, 625/50), and
   six SMPTE standard: D-7(525/60, 625/50), D-9 25Mbps(525/60, 625/50)
   and D-9 50Mbps (525/60, 625/50).

   Throughout this specification, we make extensive use of the
   terminology of IEC and SMPTE standard. The reader should consult the
   original references for definitions of these terms.

3. DV format encoding

   The DV format is designed for magnetic tape applications and is
   optimized in helical magnetic recording on tape media.  All video
   data including audio and other system data are managed within the
   picture frame unit of video.

Kobayashi, et al           Expires March 2000                   [Page 2]


Internet Draft                                          October 18, 1999

   The DV encoding is composed of a three-level hierarchical structure.
   A picture frame is divided into rectangle- or clipped-rectangle-
   shaped DCT super blocks.  DCT super blocks are divided into 27
   rectangle- or square-shaped DCT macro blocks. Audio data part is
   encoded with PCM format. Its frequency is 32 kHz, 44.1 kHz or 48 kHz,
   the quantization is 16-bit linear, 12-bit non-linear or 20-bit
   linear, and the number of channels may up to 8. Only certain
   combinations of these parameters are allowed depending upon the video
   format, the restrictions are specified in each document.  A frame of
   data in the DV format stream is divided into several "DIF sequence."
   A DIF sequence is composed of an integral number of 80 bytes length
   DIF blocks. A DIF block is the primitive unit for all treatment of DV
   stream. Each DIF block contains a 3-byte ID header that specifies the
   type of the DIF block and its position in the DIF sequence. Five
   types of DIF blocks are defined: DIF sequence header, Subcode, Video
   Auxiliary information (VAUX), Audio and Video. Audio DIF block data
   are also composed of 5 bytes Audio Auxiliary data (AAUX) and 72 bytes
   audio data.

   Each RTP packet starts with the RTP header as defined in RFC 1889
   [6].  No additional payload-format-specific header is required for
   this payload format.

4.1 RTP header usage

   The meaning of RTP header fields that are specific to the DV format
   is described in the following:

   Payload type (PT): The payload type is dynamically assigned by means
   outside the scope of this document. If multiple encoding formats are
   to be used within one RTP session, then multiple dynamic payload
   types MUST be assigned, one for each DV encoding format.  The sender
   MUST change to the corresponding payload type whenever the encoding
   format is changed.  Although VAUX data contains some encoding
   attributes of the stream, the sender MUST NOT expect to notify the
   receiver of an encoding format change with the information included
   in VAUX.  Because VAUX data only represents sub attribute information
   which relies on the video encoding format.    Even if VAUX data is
   received, the receiver cannot obtain the attribute until the video
   encoding format is determined.

   Timestamp: 32-bit 90 kHz timestamp representing the time at which the
   first data in the frame was sampled.  All RTP packets within the same
   video frame MUST have the same timestamp.  The timestamp SHOULD
   increment by a multiple of the nominal interval for one frame time,
   as given in the following table:

      Mode        Frame rate (Hz)      Increase of one frame

Kobayashi, et al           Expires March 2000                   [Page 3]


Internet Draft                                          October 18, 1999

                                      in 90kHz timestamp

     525-60         29.97                   3003
     625-50         25                      3600
     1125-60        30                      3000
     1250-50        25                      3600

   When getting DV stream from IEEE1394 interface the progress of video
   frame times MAY be monitored using the SYT timestamp carried in the
   CIP header, as described in Appendix A.

   Marker bit (M): The marker bit of the RTP fixed header is set to one
   on the last packet of a video frame, and otherwise, must be zero.
   The M bit allows the receiver to know that it has received the last
   packet of a frame so it can display the image without waiting for the
   first packet of the next frame to arrive to detect the frame change.
   However, detection of a frame change MUST NOT rely on the marker bit
   since the last packet of the frame might be lost.  Detection of a
   frame change MUST be done by differences in RTP timestamp.

4.2 DV data encapsulation into RTP payload

   Integral DIF blocks are placed into the RTP payload beginning
   immediately after the RTP header. Any number of DIF blocks may be
   packed into one RTP packet, except that all DIF blocks in one RTP
   packet must be from the same video frame. DIF blocks from the next
   video frame MUST NOT be packed into the same RTP packet even if there
   is more payload space remaining.  This requirement stems from the
   fact the transition from one video frame to the next is indicated by
   a change in the RTP timestamp. It also reduces the processing
   complexity on the receiver. Since the RTP payload contains an
   integral number of DIF blocks, the length of the RTP payload will be
   a multiple of 80 bytes.

   Audio and video data may be transmitted as one bundled RTP stream or
   in separate RTP streams(unbundled). The choice MUST be indicated as
   part of the assignment of the dynamic payload type and MUST remain
   unchanged for the duration of the RTP session to avoid complicated
   procedures of sequence number synchronization.

   In the case of one bundled stream, DIF blocks for both audio and
   video are packed into RTP packets in the same order as they were
   generated.  When audio and video are sent with unbundled streams, or
   when only one medium is sent, then only the DIF blocks corresponding
   to the selected medium are included.  If VAUX DIF blocks are
   included, they MUST only be sent in the video stream.  When using
   unbundled mode, it is RECOMMENDED that the audio stream data be
   extracted from the DIF blocks and repackaged into the corresponding

Kobayashi, et al           Expires March 2000                   [Page 4]


Internet Draft                                          October 18, 1999

   RTP payload format for the audio encoding (L16, NL12, L20) in order
   to maximize interoperability with non-DV-capable receivers within the
   original source quality [8,9].

   In the case of unbundled transmission, the same timestamp SHOULD be
   used for both audio and video data within the same frame to simplify
   the lip synchronization effort on the receiver. Lip synchronization
   may also be achieved using reference timestamps passed in RTCP as
   described in RFC1889[6].

   The sender MAY send null AAUX information and omit VAUX DIF blocks if
   the VAUX/AAUX information remains constant during the session.
   However, the VAUX/AAUX information in the DV stream includes source
   encoding parameters, such as video display aspect ratio, audio
   quantization and number of audio channels, which are required to
   decode the stream.  Therefore, if VAUX/AAUX information is not
   transmitted in the stream, the equivalent parameters essential to
   playout MUST be provided by some out of band means beyond the scope
   of this document.  The receiver MUST be able to process a data stream
   with null AAUX information and null or omitted VAUX DIF blocks if the
   equivalent parameters are provided out of band.  Therefore, if the
   RTP receiver is feeding the DV stream to a device that requires AAUX
   information and VAUX DIF blocks, the receiver MUST be able to
   generate AAUX within audio DIF blocks and VAUX DIF blocks for the
   device using the parameters provided out of band.

   The sender MAY reduce the video frame rate by discarding the video
   data and VAUX DIF blocks for some of the video frames.  The RTP
   timestamp must still be incremented to account for the discarded
   frames.  The sender MAY alternatively reduce bandwidth by discarding
   video data DIF blocks for portions of the image which are unchanged
   from the previous image.  To enable this bandwidth reduction,
   receivers SHOULD implement an error concealment strategy to
   accommodate lost or missing DIF blocks by repeating the corresponding
   DIF block from the previous image.

5. SDP Signaling for RTP/DV

   When using SDP(Session Description Protocol) for negotiation of the
   RTP payload information, the format described in this document SHOULD
   be used. SDP description will be slightly different for a bundled
   stream and an unbundled stream.

5.1 SDP description for unbundled stream

   When using an unbundled stream, an RTP stream for video and audio
   will be sent separately to a different port or a different multicast
   group. When this is done, SDP carries several m=?? lines which is for

Kobayashi, et al           Expires March 2000                   [Page 5]


Internet Draft                                          October 18, 1999

   media type of the stream (see RFC2327[10]).  For example, when audio
   is sent by port 31394 and RTP payload type identifier 111, the m=??
   line will be like;

        m=video 31394 RTP/AVP 111

   The a=rtpmap attribute will be like;

        a=rtpmap:111 DV/90000

   "DV" is the encoding name for the DV video payload format defined in
   this document. 90000 shows the clock rate. The clock used for the
   payload format defined in this document uses 90kHz clock.

   In SDP, format specific parameters are defined as a=fmtp, as below.

          a=fmtp:<format> <format specific parameters>

   In the DV video payload format, the a=fmtp line will be used to show
   the encoding type within the DV video and will be used as below.

          a=fmtp:<payload type> v-encode:<DV-video encoding>

   The block with the parameters, <DV-video encoding> is used to
   describe which type of DV format is used. The parameters for <DV-
   video encoding> will be one of the following;

         o  SD-VCR/525-60
         o  SD-VCR/625-50
         o  HD-VCR/1125-60
         o  HD-VCR/1250-50
         o  SDL-VCR/525-60
         o  SDL-VCR/625-50
         o  306M/525-60
         o  306M/625-50
         o  314M-25/525-60
         o  314M-25/625-50
         o  314M-50/525-60
         o  314M-50/625-50

   An example of SDP description using these attributes is:

      v=0
      o=mhandley 2890844526 2890842807 IN IP4 126.16.64.4
      s=SDP Seminar
      i=A Seminar on the session description protocol
      u=http://www.cs.ucl.ac.uk/staff/M.Handley/sdp.03.ps
      e=mjh@isi.edu (Mark Handley)

Kobayashi, et al           Expires March 2000                   [Page 6]


Internet Draft                                          October 18, 1999

      c=IN IP4 224.2.17.12/127
      t=2873397496 2873404696
      m=audio 49170 RTP/AVP 112
      a=rtpmap:112 L16/32000/2
      m=video 50000 RTP/AVP 113
      a=rtpmap:113 DV/90000
      a=fmtp:113 encode:SD-VCR/525-60

   This describes a session where audio and video streams are sent
   separately. The session is sent to a multicast group 224.2.17.12. The
   audio is sent using L16 format, and the video is sent using SD-VCR
   525/60 format which corresponds to NTSC format in consumer DV.

5.2 SDP description for bundled stream

   When sending a bundled stream, all of DIF blocks including system
   data will be sent through a single RTP stream. Too many audio format
   attributes are defined in DV, such as sampling,quantization, number
   of audio channel, channel assignment, language, emphasis and the
   picture flame locking. These attribute information are carried as
   AAUX data within audio DIF blocks. Describing all these attributes
   with SDP requires large number of entry definition corresponding each
   attribute, and also requires larger size of SDP record to enumerate
   the combination of attribute.  Therefore, the entry to describe audio
   format attribute does not define in DV over RTP.  The attribute
   information for audio and video format is taken from AAUX data
   carried within audio data and VAUX DIF blocks, respectively.  The
   AAUX and VAUX data associated with the encoding attribute are
   multiply contained within one picture frame. Even if the AAUX and
   VAUX data are lost, the receiver can recover the lost part with other
   data in the same video frame.  In order for the receiver to know
   audio format information, the RTP sender MUST transmit AAUX data into
   the audio stream at least AAUX source and AAUX source control pack
   when using this mode.  The encoding name for bundled DV streams is
   defined as "BDV" in this document. The a=rtpmap attribute in the
   session information will be like as:

        a=rtpmap:111 BDV/90000

   The parameters to represent the DV video encoding format will use
   "fmtp" attribute and the same format will be used described in 5.1.

   An example of SDP description for bundled DV stream is :

      v=0
      o=mhandley 2890844526 2890842807 IN IP4 126.16.64.4
      s=SDP Seminar

Kobayashi, et al           Expires March 2000                   [Page 7]


Internet Draft                                          October 18, 1999

      i=A Seminar on the session description protocol
      u=http://www.cs.ucl.ac.uk/staff/M.Handley/sdp.03.ps
      e=mjh@isi.edu (Mark Handley)
      c=IN IP4 224.2.17.12/127
      t=2873397496 2873404696
      m=video 49170 RTP/AVP 112 113
      a=rtpmap:112 BDV/90000
      a=fmtp: 112 encode:SD-VCR/525-60
      a=fmtp: 113 encode:306M/525-60

   Above SDP record describes a session where audio and video streams
   are sent bundled. The session is sent to a multicast group
   224.2.17.12.  The video is sent using 525/60 consumer DV and
   SMPTE306M format, when the payload type is 112 and 113, respectively.

6. Security Considerations

   RTP packets using the payload format defined in this specification
   are subject to the security considerations discussed in the RTP
   specification [6], and any appropriate RTP profile.  This implies
   that confidentiality of the media streams is achieved by encryption.
   Because the data compression used with this payload format is applied
   to end-to-end, encryption may be performed after compression so there
   is no conflict between the two operations.

   A potential denial-of-service threat exists for data encodings using
   compression techniques that have non-uniform receiver-end
   computational load.  The attacker can inject pathological datagrams
   into the stream which are complex to decode and cause the receiver to
   be overloaded.  However, this encoding does not exhibit any
   significant non-uniformity.

   As with any IP-based protocol, in some circumstances a receiver may
   be overloaded simply by the receipt of too many packets, either
   desired or undesired.  Network-layer authentication may be used to
   discard packets from undesired sources, but the processing cost of
   the authentication itself may be too high.  In a multicast
   environment, pruning of specific sources may be implemented in future
   versions of IGMP [11] and in multicast routing protocols to allow a
   receiver to select which sources are allowed to reach it.

7. Full Copyright Statement

   Copyright (C) The Internet Society (1999). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published

Kobayashi, et al           Expires March 2000                   [Page 8]


Internet Draft                                          October 18, 1999

   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.

   However, this document itself may not be modified in any way, such as
   by removing the copyright notice or references to the Internet Soci-
   ety or other Internet organizations, except as needed for the purpose
   of developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be fol-
   lowed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER-
   CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."

8. Authors' Addresses

   Katsushi Kobayashi Communication Research Laboratory 4-2-1 Nukii-kita
   machi, Koganei Tokyo 184-8795 JAPAN EMail:  ikob@koganei.wide.ad.jp

   Akimichi Ogawa Keio University 5322 Endo, Fujisawa Kanagawa 252 JAPAN
   EMail:  akimichi@sfc.wide.ad.jp

   Stephen L. Casner Cisco Systems, Inc.  170 West Tasman Drive San
   Jose, CA 95134-1706 United States EMail: casner@cisco.com

   Carsten Bormann Universitaet Bremen FB3 TZI Postfach 330440 D-28334
   Bremen, GERMANY Phone: +49.421.218-7024 Fax: +49.421.218-7000 EMail:
   cabo@tzi.org

9. Bibliography

   [1] IEC 61834, Helical-scan digital video cassette recording system
       using 6,35 mm magnetic tape for consumer use (525-60, 625-50,
       1125-60 and 1250-50 systems)

   [2] IEC 61883, Consumer audio/video equipment - Digital interface

   [3] IEEE Std 1394-1995, Standard for a High Performance Serial Bus

Kobayashi, et al           Expires March 2000                   [Page 9]


Internet Draft                                          October 18, 1999

   [4] SMPTE 306M, 6.35-mm type D-7 component format - video
       compression at 25Mb/s -525/60 and 625/50

   [5] SMPTE 314M, Data structure for DV-based audio and compressed
       video 25 and 50Mb/s

   [6] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson.,"RTP: A
       transport protocol for real-time applications", January 1996.
   RFC1889.

   [7] D. Hoffman, G. Fernando, V. Goyal and M. Civanlar, "RTP Payload
        Format for MPEG1/MPEG2 Video", RFC 2250, January 1998

   [8] Schulzrinne, H., "RTP Profile for Audio and Video Conferences
       with Minimal Control", RFC 1890, January 1996.

   [9] K. Kobayashi, A. Ogawa, S. Casner and C. Bormann, "RTP Payload
       Format for nonlinear 12 bits and 20 bits Audio", internet-draft,
       work in progress.

   [10] M.Handley, V.Jacobson, "SDP: Session Description Protocol",
       RFC 2327, April 1998

   [11] Deering, S., "Host Extensions for IP Multicasting", STD 5,
       RFC 1112, August 1989.

Appendix A. Sequence number difference in DV over IEEE 1394

   The specification of the Digital Interface defines a transport
   protocol for transmission of video stream data in the isochronous
   stream mode of IEEE 1394 called "real time data transmission
   protocol".  The protocol defines the general Common Isochronous
   Packet (CIP) header that does not depend on the encoding format of
   the payload.  Several real time transmission encodings have been
   defined on CIP, including MPEG2 and MIDI in addition to DV format
   [1,2]. All of the information in the CIP header is either implicit in
   the RTP payload format or supplanted by information in the RTP
   header, so the CIP header is not required. For this payload format,
   the CIP header MUST be removed from IEEE 1394 packet, leaving just a
   sequence of DIF blocks.

   The CIP header for DV video includes SYT field. The SYT is a 16-bit
   timestamp copied from lower 16 bits of CYCLE_TIME register defined in
   IEEE 1394. The CYCLE_TIME register is incremented by a 24.576 MHz
   clock, but the lower 12 bits count to a maximum of 3071 before
   wrapping around to zero and adding a carry to the high 4 bits.
   Therefore, the SYT timestamp is not increment in linear.  The RTP

Kobayashi, et al           Expires March 2000                  [Page 10]


Internet Draft                                          October 18, 1999

   timestamp could be derived from the SYT in the CIP header, but
   implementer must care the non-linear behavior of SYT field.

   If the encoding format requires synchronization between devices
   connected within same IEEE 1394 network, it should adopt the CIP
   header with SYT.  The CIP header of DV is reserved SYT field, but the
   valid SYT timestamp value is required only once in each video frames.
   In the remaining CIP headers, the SYT field may fill with the "no
   information" value (all ones).

Kobayashi, et al           Expires March 2000                  [Page 11]