Internet Engineering Task Force                     Audio-Video Transport WG
INTERNET-DRAFT draft-ietf-avt-rtp-03.txt            H. Schulzrinne/S. Casner
                                                                    AT&T/ISI
                                                          September 15, 1993
                                                          Expires:  11/01/93


            RTP: A Transport Protocol for Real-Time Applications



Status of this Memo


This document is an Internet Draft.   Internet Drafts are working  documents
of the Internet Engineering  Task Force (IETF), its  Areas, and its  Working
Groups.   Note that other  groups may also  distribute working documents  as
Internet Drafts.

Internet Drafts  are draft  documents valid  for a  maximum of  six  months.
Internet Drafts may be  updated, replaced, or  obsoleted by other  documents
at any time.   It  is not appropriate  to use Internet  Drafts as  reference
material or to  cite them other  than as  a ``working draft''  or ``work  in
progress.''

Please check  the I-D  abstract  listing contained  in each  Internet  Draft
directory to learn the current status of this or any other Internet Draft.

Distribution of this document is unlimited.


                                  Abstract


     This  memorandum describes a protocol called  RTP suitable for the
    end-to-end  network transport  of real-time  data,  such  as audio,
    video or simulation  data for both multicast  and unicast transport
    services.   The data transport  is augmented by  a control protocol
    (RTCP)  designed  to  provide minimal  control  and  identification
    functionality particularly in multicast networks.  RTP and RTCP are
    designed to be independent of  the underlying transport and network
    layers.  The protocol supports the use of RTP-level translators and
    bridges.   Within multicast associations, sites  can direct control
    messages  to individual  sites.    The  protocol  does not  address
    resource reservation and does  not guarantee quality-of-service for
    real-time services.


This specification is a product  of the Audio-Video Transport working  group
within the Internet  Engineering Task  Force.   Comments  are solicited  and


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

should be addressed to the  working group's mailing list at  rem-conf@es.net
and/or the authors.



Contents


1 Introduction                                                             3

2 RTP Protocol Use Scenarios                                               5

  2.1 Simple Multicast Audio Conference . . . . . . . . . . . . . . . . . 5

  2.2 Bridges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

  2.3 Translators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

  2.4 Security  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7


3 Definitions                                                              7

4 Byte Order, Alignment, and Reserved Values                              10


5 Real-Time Data Transfer Protocol -- RTP                                 10

  5.1 RTP Fixed Header Fields . . . . . . . . . . . . . . . . . . . . . . 10

  5.2 The RTP Options . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    5.2.1CSRC: Content source identifiers . . . . . . . . . . . . . . . . 13

    5.2.2SSRC: Synchronization source identifier  . . . . . . . . . . . . 13

    5.2.3BOS: Beginning of synchronization unit . . . . . . . . . . . . . 14

  5.3 Reverse-Path Option . . . . . . . . . . . . . . . . . . . . . . . . 14

    5.3.1SDST: Synchronization destination identifier . . . . . . . . . . 15

  5.4 Security Options  . . . . . . . . . . . . . . . . . . . . . . . . . 16

    5.4.1ENC: Encryption  . . . . . . . . . . . . . . . . . . . . . . . . 18

    5.4.2MIC: Messsage integrity check  . . . . . . . . . . . . . . . . . 19

    5.4.3MICA: Message integrity check, asymmetric encryption . . . . . . 20

    5.4.4MICK: Message integrity check, keyed . . . . . . . . . . . . . . 21


H. Schulzrinne/S. Casner              Expires 11/01/93              [Page 2]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

    5.4.5MICS: Message integrity check, symmetric-key encrypted . . . . . 22


6 Real Time Control Protocol --- RTCP                                     22

  6.1 FMT: Format description . . . . . . . . . . . . . . . . . . . . . . 23

  6.2 SDES: Source descriptor . . . . . . . . . . . . . . . . . . . . . . 24

  6.3 BYE: Goodbye  . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

  6.4 QOS: Quality of service measurement . . . . . . . . . . . . . . . . 27

7 Security Considerations                                                 28


8 RTP over Network and Transport Protocols                                29

  8.1 Defaults  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

  8.2 ST-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

A Implementation Notes                                                    30

  A.1 Timestamp Recovery  . . . . . . . . . . . . . . . . . . . . . . . . 31

  A.2 Detecting the Beginning of a Synchronization Unit . . . . . . . . . 32

  A.3 Demultiplexing and Locating the Synchronization Source  . . . . . . 33

  A.4 Parsing RTP Options . . . . . . . . . . . . . . . . . . . . . . . . 33

  A.5 Determining the Expected Number of RTP Packets  . . . . . . . . . . 34


B Addresses of Authors                                                    35


1 Introduction


This memorandum specifies a  transport protocol for real-time  applications.
A discussion of real-time services  and algorithms for their  implementation
and some of the  RTP design decisions  can be found  in the current  version
of the  companion  Internet  draft draft-ietf-avt-issues.     The  transport
protocol provides  end-to-end  delivery  services for  data  with  real-time
characteristics, for  example,  interactive audio  and video.    RTP  itself
does not provide any  mechanism to ensure timely  delivery or provide  other
quality-of-service guarantees, but relies on lower-layer services to do  so.
It does ___ guarantee  delivery or prevent  out-of-order delivery, nor  does
it assume that the  underlying network is reliable  and delivers packets  in


H. Schulzrinne/S. Casner              Expires 11/01/93              [Page 3]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

sequence.   The sequence numbers  included in  RTP allow the  end system  to
reconstruct the sender's packet  sequence, but  sequence numbers might  also
be used to determine the proper location  of a packet, for example in  video
decoding, without necessarily decoding  packets in sequence.   RTP does  not
provide quality-of-service guarantees.   RTP  is designed to  run on top  of
a variety of  network and  transport protocols,  for example,  IP, ST-II  or
UDP.(1) RTP  transfers data  in a  single  direction, possibly  to  multiple
destinations if  supported by  the  underlying network.    A  mechanism  for
sending control data in the opposite direction, reversing the path traversed
by regular data, is provided.

While RTP is primarily  designed to satisfy  the needs of  multi-participant
multimedia conferences, it  is not limited  to that particular  application.
Storage of  continuous  data,  interactive  distributed  simulation,  active
badge,  and  control  and   measurement  applications  may  also  find   RTP
applicable.   Profiles are  used to  instantiate certain  header fields  and
options for particular sets of applications.  A profile for audio and  video
data may be found in the companion Internet draft draft-ietf-avt-profile.

The current  Internet  does not  support  the widespread  use  of  real-time
services.     High-bandwidth  services  using   RTP,  such  as  video,   can
potentially seriously degrade other  network services.   Thus,  implementors
should take  appropriate precautions  to limit  accidental bandwidth  usage.
Application  documentation  should  clearly  outline  the  limitations   and
possible operational  impact of  high-bandwidth  real-time services  on  the
Internet and other network services.

This document defines a packet format shared by two protocols:


  o the real-time  transport protocol  (RTP),  for exchanging  data that  hs
    real-time  properties.    The  RTP  header consists  of  a  fixed-length
    portion plus optional control fields;

  o the RTP  control protocol  (RTCP), for conveying  information about  the
    participants  in an  on-going  session.    RTCP consists  of  additional
    header options  that may  be ignored  without affecting  the ability  to
    receive  data correctly.     RTCP  is used  for  ``loosely  controlled''
    sessions,  i.e.,  where there  is  no explicit  membership  control  and
    set-up.    Its  functionality  may  be subsumed  by  a  session  control
    protocol, which is beyond the scope of this document.

------------------------------
 1. For most  applications, RTP  offers insufficient  demultiplexing to  run
directly on IP.








H. Schulzrinne/S. Casner              Expires 11/01/93              [Page 4]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

2 RTP Protocol Use Scenarios


The following sections describe some aspects of the use of RTP. The examples
were chosen to  illustrate the  basic operation of  applications using  RTP,
not to  limit  what RTP  may  be  used for.    In  these  examples,  RTP  is
carried on top  of IP and  UDP, and follows  the conventions established  by
the profile for audio  and video specified in  the companion Internet  draft
draft-ietf-avt-profile.



2.1 Simple Multicast Audio Conference


A working group  of the  IETF meets to  discuss the  latest protocol  draft,
using the IP multicast  services of the  Internet for voice  communications.
Through some  allocation  mechanism,  the  working  group  chair  obtains  a
multicast group  address;  all participants  use  the destination  UDP  port
specified by the profile.   The multicast address and port are  distributed,
say, by electronic mail, to all  intended participants.  The mechanisms  for
discovering available multicast addresses  and distributing the  information
to participants are beyond the scope of RTP.

The audio conferencing application used by each conference participant sends
audio data in small  chunks of, say, 20  ms duration.   Each chunk of  audio
data is preceded by an RTP header; RTP header and data are in turn contained
in a UDP packet.   The  Internet, like  other packet networks,  occasionally
loses and reorders packets and delays them by variable amounts of time.   To
cope with these impairments, the RTP header contains timing information  and
a sequence number that  allow the receivers to  reconstruct the timing  seen
by the source, so that,  in our case, a chunk  of audio is delivered to  the
speaker every 20 ms.  The sequence  number can also be used by the  receiver
to estimate how many packets are being lost.  Each RTP packet also indicates
what type of  audio encoding  (such as  PCM, ADPCM  or GSM)  is being  used,
so that senders can  change the encoding during  a conference, for  example,
to accommodate a new participant  that is connected through a  low-bandwidth
link.

Since members of the working group join and leave during the conference,  it
is useful to know  who is participating  at any moment.   For that  purpose,
each instance  of  the  audio application  in  the  conference  periodically
multicasts the name, email address and other information of its user.   Such
control information is  carried as  RTCP SDES options  within RTP  messages,
with or  without audio  data (see  Section 6.2).    These periodic  messages
also provide some indication as to  whether the network connection is  still
functioning.   A  site  sends the  RTCP BYE  (Section  6.3) option  when  it
leaves a conference.  The RTCP  QOS (Section 6.4) option indicates how  well
the current speaker is  being received and may  be used to control  adaptive
encodings.



H. Schulzrinne/S. Casner              Expires 11/01/93              [Page 5]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

2.2 Bridges


So far, we have  assumed that all  sites want to receive  audio data in  the
same format.    However,  this may  not  always be  appropriate.    Consider
the case where participants  in one area are  connected through a  low-speed
link to the majority  of the conference  participants, who enjoy  high-speed
network access.    Instead of  forcing everyone  to use  a  lower-bandwidth,
reduced-quality audio encoding,  a ______ is  placed near the  low-bandwidth
area.  This bridge resynchronizes incoming audio packets to reconstruct  the
constant 20 ms spacing  generated by the  sender, mixes these  reconstructed
audio streams, translates the  audio encoding to  a lower-bandwidth one  and
forwards the lower-bandwidth packet stream to the low-bandwidth sites.

After the mixing, the identity of  the high-speed site that is speaking  can
no longer be determined from the network  origin of the packet.   Therefore,
the bridge inserts a CSRC option (Section 5.2.1) into the packet  containing
a list of short site  identifiers to indicate which site(s)  ``contributed''
to that mixed packet.  An example of this is shown for bridge B1 in  Fig. 1.
As name and location information is  received by the bridge in SDES  options
from the high-speed sites,  that information is passed  on to the  receivers
along with a mapping to the CSRC identifiers.



      [E1]                                    [E6]
       |                                       |
 E1:17 |                                 E6:15 |
       |                                       |   E6:63/6
       V   B1:48 (1,2)         B1:28/1 (1,2)   V   B1:63/5 (1,2)
      (B1)-------------><T1>-----------------><T2>--------------->[E7]
       ^                 ^     E4:28/2         ^   E4:63/3
  E2:1 |           E4:47 |                     |   B3:63/4 (1,4)
       |                 |                     |
      [E2]              [E4]                   |
                                               |            LEGEND:
[E3] --------->(B2)----------->(B3)------------|          [End system]
       E3:64        B2:12 (3)   ^                         (Bridge)
                                | E5:45                   <Translator>
                                |
                               [E5]     content: source port/SSRC (CSRCs)
                                        -------------------------------->
  Figure 1:  Sample RTP network with end systems, bridges and translators



2.3 Translators


Not all  sites are  directly accessible  through IP  multicast.   For  these
sites, mixing  may  not  necessary,  but a  translation  of  the  underlying
transport protocol is.   RTP-level  gateways that do  not restore timing  or

H. Schulzrinne/S. Casner              Expires 11/01/93              [Page 6]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

mix packets from different sources are called ___________ in this  document.
Application-level firewalls, for example, will not let any IP packets  pass.
Two translators  are installed,  one on  either side  of the  firewall,  the
outside one  funneling all  multicast packets  received through  the  secure
connection to the  translator inside the  firewall.   The translator  inside
the firewall sends  them again  as multicast  packets to  a multicast  group
restricted to  the site's  internal network.    Other examples  include  the
connection of a group of hosts speaking only IP/UDP to a group of hosts that
understand only ST-II.

After RTP  packets have  passed through  a translator,  they all  carry  the
network source  address of  the translator,  making  it impossible  for  the
receiver to distinguish  packets from  different speakers  based on  network
source addresses.    Since each  sending site  has its  own sequence  number
space and slightly offset timestamp  space, the receiver could not  properly
mix the audio  packets.   (For video,  it could not  properly separate  them
into distinct displays.)   Instead  of forcing all  senders to include  some
globally unique identifier  in each  packet,  a translator  inserts an  SSRC
option (Section  5.2.2) with  a  short identifier  for  the source  that  is
locally unique to the translator.  This  also works if an RTP packet has  to
travel through several translators, with the SSRC value being mapped into  a
new locally unique value at each translator.  An example is shown in Fig. 1,
where hosts T1 and  T2 are translators.   The RTP packets  from host E4  are
identified with SSRC value 2, while those coming from bridge B1 are  labeled
with SSRC value 1.  Similarly, translator T2 labels packets from E6, B1,  E4
and B3 with SSRC values  6, 5, 3 and 4,  respectively (or some other  unique
values).



2.4 Security


Conference participants  would often  like to  ensure that  nobody else  can
listen to their deliberations.  Encryption, indicated by the presence of the
ENC option (Section 5.4.1),  provides that privacy.   The encryption  method
and key can be changed during the conference by indexing into a table.   For
example, a meeting may go into  executive session, protected by a  different
encryption key accessible only to a subset of the meeting participants.

For authentication, a number of methods are provided, depending on needs and
computational capabilities.  All these message integrity check (MIC) options
(Sections 5.4.3 and following)  compute cryptographic checksums, also  known
as message digests, over the RTP data.


3 Definitions


_______ is the data following the RTP fixed header and the RTP/RTCP options.
The payload format  and interpretation are  beyond the scope  of this  memo.


H. Schulzrinne/S. Casner              Expires 11/01/93              [Page 7]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

RTP packets without payload  are valid.   Examples of payload include  audio
samples and video data.

An ___  ______  consists  of  the encapsulation  specific  to  a  particular
underlying protocol, the fixed RTP header, RTP and RTCP options (if any) and
the payload, if any.  A single packet of the underlying protocol may contain
several RTP packets if permitted by the encapsulation method.

A __________  ____ is  the  ``abstraction that  transport protocols  use  to
distinguish among  multiple  destinations  within  a  given  host  computer.
TCP/IP protocols identify  ports using  small positive  integers.'' [1]  The
transport selectors (TSEL) used by the OSI transport layer are equivalent to
ports.

A _________ _______ denotes  the combination of  network address, e.g.,  the
4-octet IP Version 4 address, and the transport protocol port, e.g., the UDP
port.  In  OSI systems,  the transport address  is called transport  service
access point or TSAP. The destination transport address may be a unicast  or
multicast address.

A _______  ______  is the  actual  source of  the  data carried  in  an  RTP
packet, for example,  the application that  originally generated some  audio
data.  Data from one or more  content sources may be combined into a  single
RTP packet by a bridge, which  becomes the synchronization source (see  next
paragraph).  Content  sources identify the logical  source of the data,  for
example, to highlight the current speaker in an audio conference; they  have
no effect on the delivery or playout timing of the data itself.  In  Fig. 1,
E1 and E2 are the content sources of the data received by E7 from bridge B1,
while B1 is the synchronization source.

A _______________ ______ is the combination  of one or more content  sources
with its  own timing.    Each synchronization  source has  its own  sequence
number space.   The  audio coming  from a  single microphone  and the  video
from a  camera  are examples  of  synchronization  sources.    The  receiver
groups packets by synchronization source for  playback.  Typically a  single
synchronization source emits  a single  medium (e.g.,  audio or video).    A
synchronization source may  change its  data format,  e.g., audio  encoding,
over time.    Synchronization  sources  are identified  by  their  transport
address and the identifier carried in the  SSRC option.  If the SSRC  option
is absent, a value of zero is assumed for that identifier.

A _________ ______ is the transport-level origin of the RTP packets as  seen
by the receiving end system.  In  Fig. 1, host T2, port 63 is the  transport
source of all packets received by end system E7.

A  _______  comprises  all  synchronization  sources  sending  to  the  same
destination transport address using the same RTP channel identifier.

An ___ ______ generates the content to  be sent in RTP packets and  consumes
the content  of  received  RTP.  An  end system  can  act  as  one  or  more
synchronization sources.   (Most  end systems are  expected to  be a  single


H. Schulzrinne/S. Casner              Expires 11/01/93              [Page 8]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

synchronization source.)

An (RTP-level)  ______  receives  RTP  packets from  one  or  more  sources,
combines them in some manner and then forwards  a new RTP packet.  A  bridge
may change the data format.   Since the timing among multiple input  sources
will not generally be synchronized, the bridge will make timing  adjustments
among the  streams and  generate its  own timing  for the  combined  stream.
Therefore, bridges are  synchronization sources,  with each  of the  sources
whose packets  were combined  into an  outgoing RTP  packet as  the  content
sources for that outgoing  packet.  Audio  bridges and media converters  are
examples of bridges.  In Fig. 1,  end systems E1 and E2 use the services  of
bridge B1.  B1 inserts CSRC identifiers  for E1 and E2 when they are  active
(e.g., talking in an audio conference).  The RTP-level bridges described  in
this document are unrelated  to the data link-layer  bridges found in  local
area networks.  If there  is possibility for confusion, the term  'RTP-level
bridge' should be used.   The name  bridge follows common  telecommunication
industry usage.

An (RTP-level) __________  forwards RTP packets,  but does  not alter  their
sequence numbers  or timestamps.    Examples  of  its use  include  encoding
conversion without mixing or retiming, conversion from multicast to unicast,
and application-level  filters in  firewalls.   A  translator is  neither  a
synchronization nor  a  content source.     The properties  of  bridges  and
translators are summarized in Table 1.  Checkmarks in parentheses  designate
possible, but unlikely actions.  The options are explained in Sections  5.2,
the RTCP options in Section 6.



                                    end sys.  bridge  translator
           mix sources                 --        x        --
           change encoding             N/A       x        x
           encrypt                      x        x       (x)
           sign for authentication      x        x        --
           alter content                x        x        x
           insert CSRC (RTP)           --        x        --
           insert SSRC (RTP)            x        x        x
           insert SDST (RTP)            x        x        --
           insert SDES (RTCP)           x        x        --


      Table 1:  The properties of end systems, bridges and translators

A _______________ ____  consists of  one or  more packets  that are  emitted
contiguously by  the sender.    The most  common synchronization  units  are
talkspurts for voice  and frames  for video  transmission.   During  playout
synchronization, the receiver must  reconstruct exactly the time  difference
between packets within a synchronization unit.  The time difference  between
synchronization units may be changed by the receiver to compensate for clock
drift or to adjust to changing network delay jitter.  For example, if  audio
packets are generated  at fixed  intervals during  talkspurts, the  receiver
has to play back packets  with exactly the same spacing.   However, if,  for

H. Schulzrinne/S. Casner              Expires 11/01/93              [Page 9]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

example, a silence period  between synchronization units (talkspurts)  lasts
600 ms,  the receiver  may adjust  it to,  say,  500 ms  without this  being
noticed by the listener.

_______ __________  refers to  other protocols  and mechanisms  that may  be
needed to  provide  a  useable  service.    In  particular,  for  multimedia
conferences, a conference control application may distribute encryption  and
authentication keys,  negotiate the  encryption algorithm  to be  used,  and
determine the mapping from  the RTP format field  to the actual data  format
used.  For simple applications, electronic mail or a conference database may
also be used.  The specification of such mechanisms is outside the scope  of
this memorandum.



4 Byte Order, Alignment, and Reserved Values


All integer  fields  are  carried in  network  byte  order,  that  is,  most
significant byte  (octet) first.    This  byte order  is commonly  known  as
big-endian.  The transmission order is described in detail in [2],  Appendix
A. Unless  otherwise noted,  numeric  constants are  in decimal  (base  10).
Numeric constants prefixed by '0x' are in hexadecimal.

Fields within the fixed header and within options are aligned to the natural
length of  the field,  i.e., 16-bit  words are  aligned  on even  addresses,
32-bit long words are aligned at addresses  divisible by four, etc.   Octets
designated as padding have the value zero.

Textual information is  encoded accorded to  the UTF-2 encoding  of the  ISO
standard 10646 (Annex F) [3,4].  US-ASCII  is a subset of this encoding and
requires no additional encoding.   The presence of multi-octet encodings  is
indicated by setting the most significant bit to  a value of one.  An  octet
with a binary value of zero may  be used as a string terminator for  padding
purposes.  However, strings are not required to be zero terminated.


5 Real-Time Data Transfer Protocol -- RTP



5.1 RTP Fixed Header Fields


The RTP header has the following format:


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| ChannelID |P|S|  format   |       sequence number         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 10]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

|     timestamp (seconds)       |     timestamp (fraction)      |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| options ...                                                   |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+


The first  eight  octets  are present  in  every  RTP packet  and  have  the
following meaning:


protocol version: 2 bits
    Identifies the  protocol version.   The version  number of the  protocol
    defined in this memo is one (1).

channel ID: 6 bits
    The channel  identifier  field forms  part of  the tuple  identifying  a
    channel (see definition in Section 3) to provide an additional  level of
    multiplexing at  the RTP  layer.   The  channel ID  field is  convenient
    if several  different  channels are  to receive  the same  treatment  by
    the underlying layers  or if a profile  allows for the concatenation  of
    several RTP packets  on different channels into  a single packet of  the
    underlying protocol layer.

option present bit (P): 1 bit
    This flag has a value of one (1) if the fixed RTP header is  followed by
    one or more options and a value of zero otherwise.

end-of-synchronization-unit (S): 1 bit
    This flag has  a value of  one in the last  packet of a  synchronization
    unit, a value of  zero otherwise.[As shown in Appendix A, the  beginning
    of a  synchronization unit can  be readily established  from this  flag.
    If this  flag were  to signal the  beginning of  a synchronization  unit
    instead, the end of  a synchronization unit could not be established  in
    real time.]

format: 6 bits
    The  format field  forms  an index  into  a table  defined  through  the
    RTCP  FMT  option  or   non-RTP  mechanisms  (see  Section  3).      The
    mapping establishes  the format of  the RTP payload  and determines  its
    interpretation by  higher layers.   If  no mapping has  been defined  in
    this manner,  a standard mapping is  specified by the companion  profile
    document, RFC TBD. Also,  default formats may be defined by the  current
    edition of the Assigned Numbers RFC.

sequence number: 16 bits
    The sequence number counts RTP packets.  The sequence  number increments
    by one  for  each packet  sent.   The  sequence number  may be  used  by
    the receiver to  detect packet loss, to  restore packet sequence and  to
    identify packets to the application.

timestamp: 32 bits


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 11]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

    The timestamp  reflects the  wall clock  time  when the  RTP packet  was
    generated.   Several consecutive RTP  packets may have equal  timestamps
    if they are generated at once.  The timestamp consists of the  middle 32
    bits of a 64-bit  NTP timestamp, as defined in  RFC 1305 [5].  That  is,
    it counts  time since 0 hours  UTC, January 1,  1900, with a  resolution
    of  65536  ticks per  second.    (UTC  is  Coordinated  Universal  Time,
    approximately equal  to the historical  Greenwich Mean Time.)   The  RTP
    timestamp wraps around approximately every 18 hours.

    The timestamp  of  the first  packet within  a synchronization  unit  is
    expected  to  closely reflect  the  actual  sampling  instant,  measured
    by  the local  system  clock.    If  possible, the  local  system  clock
    should be  controlled by a  time synchronization protocol  such as  NTP.
    However,  it  is  allowable to  operate  without  synchronized  time  on
    those systems  where it is  not available, unless  a profile or  session
    protocol requires  otherwise.   It  is  not necessary  to reference  the
    local  system  clock  to obtain  the  timestamp  for  the  beginning  of
    every synchronization  unit, but  the local clock  should be  referenced
    frequently enough  so that clock drift  between the synchronized  system
    clock and the sampling clock  can be compensated for gradually.   Within
    one synchronization  unit, it may be  appropriate to compute  timestamps
    based on  the logical  timing relationships between  the packets.    For
    audio samples, for example, the nominal sampling interval may be used.



5.2 The RTP Options


The packet  header  may  be  followed  by  options  and  then  the  payload.
Each option consists  of the  F (final)  bit, the  option type  designation,
a  one-octet  length  field  denoting  the  total  number  of  32-bit  words
comprising the option (including  F bit, type and  length), followed by  any
option-specific data.  The last option before the payload has the F bit  set
to one; for all other options this bit has a value of zero.

An application  may discard  options with  types  unknown to  it.    Private
and experimental options  should use option  types 64 through  127.   Fields
designated as  ``reserved'' or  ``R'' are  set aside  for future  use;  they
should be set to zero by senders and ignored by receivers.

Unless otherwise noted, each option may appear  only once per packet.   Each
packet may contain any number of options.  Options may appear in any  order,
unless specifically restricted by  the option description.   In  particular,
the position of some security options may have significance.

The RTP options have the following type values:






H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 12]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

name  value
CSRC      0
SSRC      1
SDST      2
BOS       3
ENC       8
MIC       9
MICA     10
MICK     11
MICS     12



5.2.1 CSRC: Content source identifiers


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|    CSRC     |    length     | content source identifier    ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The content source option, inserted only by bridges, lists all sources  that
contributed to the packet.  For example, for audio packets, all sources that
were mixed together to create  this packet are enumerated, allowing  correct
talker indication at the receiver.  Each CSRC option may contain one or more
16-bit content source identifiers.  The identifier values must be unique for
all content sources received from  a particular synchronization source on  a
particular channel;  the value of  binary zero  is reserved and  may not  be
used.  If the  number of content sources is  even, the two octets needed  to
pad the list to  a multiple of four  octets are set to  zero.  There  should
only be a single CSRC option within a packet.  If no CSRC option is present,
the content source  identifier is assumed  to have a  value of zero.    CSRC
options are not modified by RTP-level translators.

A conformant RTP  implementation does  not have to  be able  to generate  or
interpret the CSRC option.



5.2.2 SSRC: Synchronization source identifier


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|    SSRC     |  length = 1   |          identifier           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The SSRC option may  be inserted by RTP-level  translators, end systems  and
bridges.   It is  typically used only  by translators,  but it  may be  used
by an end system  application to distinguish several  sources sent with  the

H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 13]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

same transport source address.   Multiple  synchronization sources with  the
same transport  source address  (e.g., the  same IP  address and  UDP  port)
must each insert a  distinct SSRC identifier.   Conversely,  synchronization
sources that are distinguishable by  their transport address do not  require
the use of  SSRC options.   The SSRC  value zero is  reserved; the  receiver
treats the packet  as if  the SSRC  option were  not present.    If no  SSRC
option is present, the transport source  address is assumed to indicate  the
synchronization source.   There must  be no  more than one  SSRC option  per
packet; thus, a  translator must remap  the SSRC  identifier of an  incoming
packet into a new, locally unique SSRC  identifier.  The SSRC option can  be
viewed as an  extension of  the source port  number in  protocols like  UDP,
ST-II or TCP.

An RTP receiver  must support the  SSRC option.   RTP senders  only need  to
support this option if they intend to send more than one source to the  same
channel using the same source port.



5.2.3 BOS: Beginning of synchronization unit


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|     BOS     |   length = 1  |        sequence number        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The sequence number within the options  contains the sequence number of  the
first packet within the current synchronization unit.  The BOS option allows
the receiver to compute the offset of a packet with respect to the beginning
of the  synchronization  unit,  even if  the  last packet  of  the  previous
synchronization unit was lost.   It is expected that many applications  will
be able to tolerate such a loss, and so will not use the BOS option but rely
on the S bit.




5.3 Reverse-Path Option


With two-party (unicast)  communications, having  a receiver  of data  relay
back control information to the sender  is straightforward.  Similarly,  for
multicast communications,  control information  can easily  be sent  to  all
members of the  group.   It may,  however, be  desirable to  send a  unicast
message to a  single member  of a multicast  group, for  example to  request
retransmission of a  particular data  frame or to  request/send a  reception
quality report.   For  this particular  use,  RTP includes  a mechanism  for
sending so-called reverse RTP packets.  The format of reverse RTP packets is
exactly the same as  for regular RTP  packets and they can  make use of  all
the options defined in  this memorandum, except SSRC,  as appropriate.   The

H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 14]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

support for and  semantics of particular  options are to  be specified by  a
profile.  Reverse RTP packets travel through the same translators as forward
RTP packets.   A  site distinguishes reverse  RTP packets  from forward  RTP
packets by their  arrival port.    Reverse RTP  packets arrive  on the  same
port that the site  uses as a  source port for  forward (data) RTP  packets.
Only reverse RTP packets carry the SDST  option; if RTP packets are  carried
directly within IP  or other network-layer  protocols, the  presence of  the
SDST option signals that the packet is a reverse RTP packet.

A receiver of  reverse RTP  packets cannot  rely on  sequence numbers  being
consecutive, as a sender  is allowed to use  the same sequence number  space
while communicating  through this  reverse  path with  several  sites.    In
particular, a receiver of  reverse RTP packets cannot  tell by the  sequence
numbers whether it has  received all reverse  RTP packets sent to  it.   The
sequence number space of reverse RTP  packets has to be completely  separate
from that  used for  RTP  packets sent  to  the multicast  group.    If  the
same sequence number  space were used,  the members of  the multicast  group
not receiving  reverse RTP  packets would  detect a  gap in  their  received
sequence number space.    The sender of  reverse RTP  packets should  ensure
that sequence numbers  are unique,  modulo  wrap-around, so  that they  can,
if necessary,  be  used for  matching request  and  response.    (Currently,
no such request-response  mechanism has been  defined.)   As a  hypothetical
example, consider  defining  a  request  to pan  the  remote  video  camera.
After completing  the request,  the receiver  of the  request would  send  a
generic acknowledgement containing  the sequence  number of  the request  to
the requestor as an option (not as  the packet sequence number in the  fixed
header).

The timestamp should  reflect the  approximate sending time  of the  packet.
The channel identifier must  be the same as  that used in the  corresponding
forward RTP packets.

If many receivers send  a reverse RTP  packet in response  to a stimulus  in
the data stream,  the simultaneous  delivery of  a large  number of  packets
back to the data source  can cause congestion for  both the network and  the
destination (this  is known  as an  ``ack implosion'').    Thus reverse  RTP
packets should be used with care,  perhaps with mechanisms such as  response
rate limiting and random delays to spread out the simultaneous delivery.


5.3.1 SDST: Synchronization destination identifier


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|    SDST     |   length = 1  |           identifier          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The SDST option is only inserted by RTP end systems and bridges if they want
to send unicast information to a particular site within the multicast group.


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 15]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

Packets containing an SDST option must  not contain an SSRC option and  vice
versa.   Packets containing a  SDST option are  always reverse RTP  packets.
The SDST option may be used to distinguish reverse RTP packets from  forward
RTP packets if the port-number  mechanism described earlier in this  section
is not available, e.g., because RTP  packets are carried directly within  IP
packets, without UDP.

If a  forward RTP  packet carries  SSRC identifier  X when  sent from  A  to
B, where A  and B  may be  either two  translators or  an end  system and  a
translator, the unicast reverse  RTP packet will carry  an SDST option  with
identifier X from B to A.

Consider the topology shown in Fig. 1.  Assume that all forward RTP  packets
are addressed to destination port 8000.  For the case that B1 wants  to send
a reverse packet to E1, B1 simply sends to the source address and port, that
is, port 17 in this example.  E1 can tell by the arrival on port 17 that the
packet is a reverse packet rather than a regular (forward) packet.

The mechanism is somewhat more complicated  when translators intervene.   We
focus on end system E7.   E7 receives, say,  video from a range of  sources,
E1 through E6  as indicated  by the  arrows.   The transmission  from T2  to
E7 could be either  multicast or unicast.   Assume that E7  wants to send  a
retransmission request, a request to pan the camera, etc., to end system  E4
and only to E4.  E7 may not be able to directly reach E4, as E4 may be using
a network protocol unknown to E7 or be located behind a firewall.  According
to the figure, video transmissions from  E4 reach E7 through T2 with  source
port 63 and SSRC identifier 3.  For the reverse message, E7 sends  a message
to T2, with destination port 63  and SDST identifier 3.   T2 can look up  in
its table that  it sends forward  data coming from  T1 with that  identifier
3.   T2 also  knows that  those messages  from T1  carry SSRC  2 and  arrive
with source port 28.   Just  like E7,  T2 places the  SSRC identifier, 2  in
this case, into the SDST  option and forwards the packet  to T1 at port  28.
Finally, translator T1  consults its table  to find that  it labels  packets
coming from E4,  port 47 with  SSRC value 2  and thus  knows to forward  the
reverse packet to E4, port 47.   T1 can either  place SDST value zero or  no
SDST option into that packet.   Note that E4 cannot directly determine  that
E7 sent the reverse packet, rather than, say,  E6.  If that is important,  a
global identifier as defined for the QOS option needs to be included in  the
reverse packet.

Only applications that need to send  or receive reverse control RTP  packets
need to implement the SDST option.




5.4 Security Options


The security  options  below offer  message  integrity,  authentication  and
privacy and  the  combination  of the  three.    Support  for  the  security
options is not mandatory, but  see the discussion for the  ENC option.   The

H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 16]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

four message integrity check  options --- MIC, MICA,  MICK and MICS ---  are
mutually exclusive, i.e., only  one of them should be  used in a single  RTP
packet.

Combinations of one of the message integrity check options (MIC, MICA, MICK,
or MICS) and  the encryption  (ENC) option described  below can  be used  to
provide a variety of security services:



confidentiality: Confidentiality  means that only  the intended  receiver(s)
    can  decode  the received  RTP  packets;  for  others,  the  RTP  packet
    contains no  useful  information.   Confidentiality  of the  content  is
    achieved by encryption.   The presence of encryption and the  encryption
    initialization vector  is indicated  by the  ENC option.[For  efficiency
    reasons,  this specification  does not  insist that  content  encryption
    only be  used in conjunction with  message integrity and  authentication
    mechanisms.  In most  cases, it will be obvious to the person  receiving
    the data if he or she does not possess the right encryption key.]

authentication and message integrity: In  combination with  certificates(2),
    the receiver  can ascertain that  the claimed originator  is indeed  the
    originator  of the  data  (authentication) and  that  the data  has  not
    been  altered after  leaving  the sender  (message  integrity).    These
    two  security services  are  provided  by the  message  integrity  check
    options.    Certificates  for MICA  must  be distributed  through  means
    outside of RTP. The  services offered by MICA and MIC/MICK/MICS  differ:
    With  MIC/MICK/MICS, the  receiver  can  only verify  that  the  message
    originated  within  the  group  holding  the  secret key,   rather  than
    authenticate the sender  of the message,  while the MICA option  affords
    true authentication of the sender.

authentication, message integrity, and confidentiality: By   carrying   both
    the  message  integrity  check  and  ENC  option  in RTP  packets,   the
    authenticity, message  integrity and confidentiality  of the packet  can
    be  assured  (subject to  the  limitations  discussed  in  the  previous
    paragraph).

    The  message integrity  check  is applied  first  to all  parts  of  the
    outgoing packet  to be  authenticated, and the  message integrity  check
    option is  prepended to  those parts.    Then the  packet including  the
    message integrity  check option  is  encrypted using  the shared  secret
    key.    The ENC  option  must be  followed  immediately by  the  message
    integrity check  option,  without any  other options  in between.    The
    receiver first  decrypts the octets  following the ENC  option and  then
    authenticates the  decrypted data using the  signature contained in  the
    message integrity check option.

    For this combination of security features and group authentication,  the
------------------------------
 2. For a description of certificates see, for example, RFC 1422 or [6].


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 17]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

    combination ENC and MIC is recommended (instead of MICS or MICK),  as it
    yields the lowest processing overhead.



A message integrity  check option followed  by an ENC  option should not  be
used.   All  message integrity  check options  are computed  over the  fixed
header, the first four octets of the message integrity check option and  the
data, that is,  the remaining  header options  and payload  that follow  the
message integrity check  option.   The MICK option  includes the whole  MICK
option itself in the message integrity check.  The fixed header is protected
to foil replay attacks and reassignment to a different channel.

The message  integrity check  options and  the ENC  option shall  not  cover
the SSRC and  SDST options,  i.e., SSRC  and SDST must  be inserted  between
the fixed header and the  ENC or message integrity  check options; SSRC  and
SDST are subject  to change by  translators that likely  do not possess  the
necessary descriptor table  (see below)  and encryption keys.    Translators
that have the  necessary keys  and descriptor translation  table may  modify
the contents of the  RTP packet, unless  the MICA option  is used (see  MICA
description in Section 5.4.3).

All security options carry  a one-octet descriptor field.   This  descriptor
is an index into  two tables, one for  the message integrity check  options,
one for the  ENC option,  established  by non-RTP means,  containing  digest
algorithms (MD2,  MD5,  etc.),  encryption  algorithms  (DES  variants)  and
encryption keys  or shared  secrets (for  the  MICK option).    All  sources
within the same channel  share the same table;  this reduces per-site  state
information.  The descriptor value may change during a session, for example,
to switch to a different encryption key.

The descriptor value zero selects a  set of default algorithms, namely,  MD5
for the message digest algorithm, DES CBC for the encryption algorithm.


5.4.1 ENC: Encryption


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|     ENC     |   length = 3  |    reserved   |   descriptor  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      DES (CBC) initialization vector, bytes 0 through 3       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      DES (CBC) initialization vector, bytes 4 through 7       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|     ENC     |   length = 1  |    reserved   |   descriptor  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 18]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

All packet data after this option is encrypted, using the encryption key and
symmetric encryption algorithm  specified by  the descriptor field.    Every
encrypted RTP packet must contain this option.   Note that the fixed  header
is specifically not  encrypted because  some fields must  be interpreted  by
translators that will not have access to the key.  The descriptor value  may
change over time to accommodate  varying security requirements or limit  the
amount of ciphertext using the  same key.  For  example, in a job  interview
conducted across a network, the  candidate and interviewers could share  one
key, with a second key set aside  for the interviewers only.  For  symmetric
keys, source-specific keys offer no advantage.

The descriptor value  zero is  reserved for a  default mode  using the  Data
Encryption Standard (DES)  algorithm in  CBC (cipher  block chaining)  mode,
as described  in  Section 1.1  of  RFC 1423  [7].    The  padding  specified
in that section  is to  be used.    The 8-octet  initialization vector  (IV)
may be carried unencrypted  within the ENC option,  generated anew for  each
packet.    If the  ENC  option does  not  contain an  initialization  vector
(indicated by an option length of one), the fixed RTP header is used as  the
initalization vector.   (Using the  fixed RTP header  as the  initialization
vector avoids regenerating  the initialization  vector for  each packet  and
incurs less  header  overhead.)    For  details  on the  tradeoffs  for  CBC
initialization vector use, see [8].  Support for encryption is not required.
Implementations that  do not  support encryption  should recognize  the  ENC
option so  that they  can avoid  processing encrypted  messages and  provide
a meaningful failure  indication.   Implementations that support  encryption
should, at the minimum, always support the DES CBC algorithm.



5.4.2 MIC: Messsage integrity check


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|     MIC     |     length    |    reserved   |   descriptor  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 message digest (unencrypted)                 ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


The  MIC  option  option   is  used  only  in   combination  with  the   ENC
option immediately  preceding it  to provide  privacy and  group  membership
authentication.   The  message  integrity check  uses the  digest  algorithm
specified by the  descriptor field.   (A message  digest is a  cryptographic
hash function that  transforms a  message of  any length  to a  fixed-length
byte string,  where the  fixed-length string  has the  property that  it  is
computationally infeasible to generate  another, different message with  the
same digest.)   The value zero  implies the use of  the MD5 message  digest.
Note that the MIC option is not separately encrypted.



H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 19]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

5.4.3 MICA: Message integrity check, asymmetric encryption


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|    MICA     |    length     |         message digest       ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    (asymmetrically encrypted)                ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Currently, only the  use of  the MD2  and MD5 message  digest algorithms  is
defined, as described in RFC  1319 [9] (as corrected  in Section 2.1 of  RFC
1423) and RFC 1321 [10], respectively.  The MD2 and MD5 message digests  are
16 octets long.

RFC 1423, Section 2.1:



    To  avoid any  potential ambiguity  regarding the  ordering  of the
    octets of  an MD2  message  digest that  is input  as a  data value
    to another encryption process  (e.g., RSAEncryption), the following
    holds true.   The first  (or left-most displayed, if  one thinks in
    terms of a  digest's ``print'' representation) octet  of the digest
    (i.e.,  digest[0] as  specified in  RFC 1319),  when  considered as
    an RSA  data value,  has  numerical weight  2**120.   The  last (or
    right-most displayed) octet  (i.e., digest[15] as  specified in RFC
    1319) has numerical weight 2**0.


RFC 1423, Section 2.2:


    To  avoid any  potential ambiguity  regarding the  ordering  of the
    octets of an MD5 message digest that  is input as an RSA data value
    to the  RSA  encryption process,  the following  holds true.    The
    first (or left-most displayed, if one thinks in terms of a digest's
    ``print'' representation) octet of  the digest (i.e., the low-order
    octet of  A as specified  in RFC 1321),  when considered as  an RSA
    data value, has  numerical weight 2**120.   The last (or right-most
    displayed) octet (i.e.,  the high-order octet of D  as specified in
    RFC 1321) has numerical weight 2**0.


The message digest is  encrypted, using asymmetric  keys, with the  sender's
private key using the algorithm described in Section 4.2.1 of RFC 1423:



    As described in PKCS #1, all quantities input as data values to the
    RSAEncryption process shall be properly justified and padded to the

H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 20]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

    length of the modulus prior to the encryption process.  In general,
    an RSAEncryption input  value is formed by  concatenating a leading
    NULL octet, a block type BT, a padding string PS, a NULL octet, and
    the data quantity D, that is, RSA input value = 0x00, BT, PS, 0x00,
    D. To  prepare a MIC  for RSAEncryption,  the PKCS #1  ``block type
    01'' encryption-block  formatting scheme  is employed.    The block
    type BT is a single octet containing the value 0x01 and the padding
    string PS is one  or more octets (enough octets  to make the length
    of the complete RSA input value equal to the length of the modulus)
    each containing the value 0xFF. The data quantity D is comprised of
    the MIC and the MIC algorithm identifier.



The encoding is described  in detail in  RFC 1423.   For encrypting MD2  and
MD5, the data quantity  D comprises the 16-octet  checksum, preceded by  the
binary sequences shown here in hexadecimal:   0x30, 0x20, 0x30, 0x0C,  0x06,
0x08, 0x2A, 0x86, 0x48, 0x86, 0xF7, 0x0D, 0x02, 0x02, 0x05, 0x00, 0x04, 0x10
for MD2 and  0x30, 0x20,  0x30, 0x0C, 0x06,  0x08, 0x2A,  0x86, 0x48,  0x86,
0xF7, 0x0D, 0x02, 0x05, 0x05, 0x00, 0x04, 0x10 for MD5.

Contrary to what is  specified in RFC  1423 for privacy  enhanced mail,  the
asymmetrically signed  MIC is  carried in  binary,  ___ represented  in  the
printable encoding of RFC  1421, Section 4.3.2.4.   The encrypted length  of
the signature  will be  equal to  the modulus  of the  RSA encryption  used,
rounded to the next integral  octet count.  The  modulus and public key  are
conveyed to the receivers by non-RTP means.  Asymmetric keys are used  since
symmetric keys would not  allow authentication of  the individual source  in
the multicast case.

The signature is  padded as necessary.   The  value of the  padding is  left
unspecified.  The number of  non-padding bits within the signature is  known
to the receiver  as being equal  to the key  length.   The MIC algorithm  is
identified through the octets prepended to the actual 16-octet signature.

A translator is not  allowed to modify  the parts of  an RTP packet  covered
by the MICA option  as the receiver  would have no  way of establishing  the
identity of the translator  and thus could not  verify the integrity of  the
RTP packet.

Support for sending or interpreting MICA options is not required.



5.4.4 MICK: Message integrity check, keyed


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|    MICS     |    length     |   reserved    |   descriptor  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 21]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

|           message digest (symmetrically encrypted)           ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This message  integrity  check option  does  not  require encryption.     In
addition to  the RTP  packet parts  to  be included  in the  message  digest
according to the introduction to this  section, the shared secret is  placed
in the MICK option and included in the message digest.  The shared secret is
equivalent to the key used  for the MICS and ENC  options, but is 16  octets
long, padded if needed with  binary zeroes.  The  shared secret in the  MICK
option is then replaced by the computed 16-octet message digest.

The receiver  stores  the  message  digest contained  in  the  MICK  option,
replaces it with the  shared secret key and  computes the message digest  in
the same manner  as the sender.   If  the RTP packet  has not been  tampered
with and has originated with  one of the holders  of the shared secret,  the
computed message digest  will agree with  the digest found  on reception  in
the MICS option.[The message  integrity check follows  the practice of  SNMP
Version 2, as described in RFC 1446, Section 1.5.1.  The MICS option  itself
is covered by the  digest in order to  detect tampering with the  descriptor
field itself.  Using the secret  key in the signature instead of  encrypting
the MD5 message digest avoids the  use of an encryption algorithm when  only
authentication is desired.  However,  the security of this approach has  not
been as well established as  the authentication based on encrypting  message
digests used in the MICS, MIC and MICA options.]



5.4.5 MICS: Message integrity check, symmetric-key encrypted


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|    MICS     |    length     |   reserved    |   descriptor  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           message digest (symmetrically encrypted)           ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This message integrity check encrypts the message digest using DES ECB  mode
as described in RFC 1423, Section 3.1.




6 Real Time Control Protocol --- RTCP


The real-time control protocol (RTCP)  conveys minimal control and  advisory
information  during  a  session.      It  provides  support  for   ``loosely
controlled'' sessions,  i.e.,  where participants  enter and  leave  without
membership control and parameter negotiation.  The services provided by RTCP


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 22]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

services augment RTP,  but an  end system does  not have  to implement  RTCP
features to participate in sessions.   There is one exception to this  rule:
if an application sends  FMT options,  the receiver has  to decode these  in
order to properly interpret the RTP payload.   RTCP does not aim to  provide
the services of  a session  control protocol and  does not  provide some  of
the services desirable for  two-party conversations.   If a session  control
protocol is in use, the services of RTCP should not be required.  (As of the
writing of this document, a session  or conference control protocol has  not
been specified within the Internet.)

RTCP options share the  same structure and numbering  space as RTP  options,
which are  described  in  Section  5.     Unless  otherwise noted,   control
information is  carried periodically  as options  within RTP  packets,  with
or without payload.   RTCP  packets are sent  to all members  of a  session.
These packets are part of the same sequence number space as RTP packets  not
containing RTCP options.    The period should  be varied  randomly to  avoid
synchronization of all sources and its mean should increase with the  number
of participants in the  session to limit the  growth of the overall  network
and host interrupt load.  The length of the period determines, for  example,
how long a receiver joining a session has to wait until it can identify  the
source.  A receiver may remove from its list of active sites a site  that it
has not been heard from for a given time-out period; the time-out period may
depend on the number of sites  or the observed average interarrival time  of
RTCP messages.  Note that not every periodic message has to contain all RTCP
options; for example, the  EMAIL part within the  SDES option might only  be
sent every few messages.  RTCP options should also be sent when  information
carried in RTCP options changes, but  the generation of RTCP options  should
be rate-limited.

The option types are defined below:



6.1 FMT: Format description


  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |F|     FMT     |    length     |R|R|  format   |    reserved   |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                    format-dependent data                     ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


format: 6 bits
    The format field  corresponds to the index  value from the format  field
    in the RTP fixed header, with values ranging from 0 to 63.

Format-dependent data: variable length
    Format-dependent data  may or may  not appear in  a FMT option.   It  is
    passed to the next layer and not interpreted by RTP.

H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 23]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

A FMT mapping changes the interpretation of a given format value carried  in
the fixed RTP header starting at the packet containing the FMT option.   The
new interpretation applies  only to  packets from  the same  synchronization
source as the  packet containing the  FMT option.   If  format mappings  are
changed through the FMT  option, the option should  be sent periodically  as
otherwise sites that did not  receive the FMT option  due to packet loss  or
joining the session  after the  FMT option  was sent  will not  know how  to
interpret the particular format value.




6.2 SDES: Source descriptor


  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |F|     SDES    |    length     |       source identifier       |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  type = ADDR  |    length     |    reserved   | address type  |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                     network-layer address                    ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  type = ADDR  |   length = 2  |    reserved   | addr. type = 1|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                          IPv4 address                         |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  type = PORT  |   length = 1  |             port              |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  type = PORT  |   length > 1  |    reserved   |    reserved   |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                             port                             ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  type = CNAME |    length     | user and domain name         ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  type = EMAIL |    length     | electronic mail address      ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 24]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

 |  type = NAME  |    length     | common name of source        ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   type = LOC  |    length     | geographic location of site  ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |   type = TXT  |    length     | text describing source       ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


The SDES option provides a mapping  between a numeric source identifier  and
one or more items identifying  the source.[Several attributes were  combined
into one  option so  that the  receiver does  not have  to perform  multiple
mappings from identifiers to site data structures.]  For those  applications
where the size of a multi-item SDES option would be a concern, multiple SDES
options may be  formed with  subsets of  the items  to be  sent in  separate
packets.

A bridge uses an identifier  value of zero within  the SDES option to  refer
to itself rather  than content  sources bridged  by it.    For each  content
source, a bridge forwards  the SDES information  received from that  source,
but changes the SDES source identifier to the value used in the CSRC  option
when identifying that content source.  A bridge that contributes local  data
to outgoing packets  should select  another non-zero  source identifier  for
that traffic and send CSRC and SDES options for it.

Translators do not modify or insert SDES  options.  The end system  performs
the same  mapping  it  uses  to  identify  the  content  sources  (that  is,
the combination of  network source,  synchronization source  and the  source
identifier within this SDES option) to  identify a particular source.   SDES
information is  specific to  a particular  channel, unless  a profile  or  a
higher-layer control protocol defines that all packets with the same  source
identifier (network and  transport-level source addresses  and the  optional
SSRC value)  from a  set of  channels defined  by the  control protocol  are
described by the same SDES.

Currently, the items listed in  Table 2 are defined.   Each has a  structure
similar to that of RTCP and RTP  options, that is, a type field followed  by
a length field, measured  in multiples of  four octets.   No final bit  (see
Section 5.2) is needed since  the overall length is known.   Text items  are
encoded according to  the rules in  Section 4.   All of  the SDES items  are
optional; however, if quality-of-service monitoring is to be used, one  ADDR
item and the PORT item are mandatory, as described for the QOS option.  Only
the TXT item is expected to change during the duration of a session.  Option
types 128 through 255 are  reserved for private or experimental  extensions.
Items are padded with  the binary value  zero to the  next multiple of  four
octets.  Each item may appear only once unless otherwise noted.

A more detailed description of the content of some of these items follows:


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 25]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993



          type   value  description
          ADDR   1      network address of source
          PORT   2      source port
          CNAME  4      canonical user and host identifier,
                        e.g., ``doe@sleepy.megacorp.com'' or
                        ``sleepy.megacorp.com''
          EMAIL  5      user's electronic mail address,
                        e.g., ``John.Doe@megacorp.com''
          NAME   6      common name describing the source,
                        e.g.,``John Doe, Bit Recycler, Megacorp''
          LOC    8      geographic user location,
                        e.g., ``Rm.  2A244, Murray Hill, NJ''
          TXT    16     text describing the source,
                        e.g., ``out for lunch''


                      Table 2:  Summary of SDES items

ADDR: This item  contains the network  address of the  source, for  example,
    the IP version  4 address or an NSAP.  The address is carried in  binary
    form,  not as  ``dotted decimal''  or similar  human-readable form.    A
    source  may send  several  network  addresses,  but only  one  for  each
    address type  value.  Address  types are identified  by the Domain  Name
    Service Resource Record (RR)  type, as specified in the current  edition
    of the Assigned Numbers RFC.

PORT: If the  length field is one, the  transport selector, such as the  UDP
    port number, is carried as  octets three and four in the first and  only
    word of  the item.   If  the length field  is greater  than one,  octets
    three and  four are  zero and the  transport selector  appears in  words
    two and  following of  this item,  in network byte  order.   The  figure
    shows the use  of the PORT item  for the TCP and  UDP protocols.   There
    must  be no  more than  one PORT  item  in an  SDES option.    The  PORT
    item  should immediately  precede  any ADDR  items.[Multiple  concurrent
    transport  addresses  are not  meaningful.     The  ordering  simplifies
    processing at  the receiver,  as the  consecutive octet  string of  PORT
    followed by the first ADDR can be used as a globally  unique identifier.
    The transport protocol does  not need to be identified, as the  receiver
    will only see one type of transport protocol for a session.]

CNAME: The CNAME item must have the format ``user@host'' or  ``host'', where
    ``host'' is the fully qualified  domain name of the host from which  the
    real-time data  originates, formatted according  to the rules  specified
    in RFC  1034,  RFC 1035  and Section  2.1 of  RFC 1123.    The  ``host''
    form  may be  used if  a user  name  is not  available, for  example  on
    single-user systems.  The  user name should be in a form that a  program
    such as  ``finger'' or  ``talk'' could use,  i.e., it  typically is  the
    login name  rather than  the ``real  life'' name.   Note  that the  host
    name is not necessarily identical to the electronic mail address  of the


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 26]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

    participant.  The latter is provided through the EMAIL item.

LOC: Depending  on  the   application,  different  degrees  of  detail   are
    appropriate  for this  item.    For conference  applications,  a  string
    like  ``Murray Hill,  New  Jersey'' may  be sufficient,  while,  for  an
    active badge system,  strings like ``Room 2A244,  AT&T BL MH'' might  be
    appropriate.  The degree of detail is left to the  implementation and/or
    user, but format and content may be prescribed by a profile.





6.3 BYE: Goodbye


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|     BYE     | length = 1    |   content source identifier   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The BYE option indicates that a particular session participant is no  longer
active.  A bridge sends BYE options with a (non-zero) content source  value.
An identifier  value of  zero indicates  that the  source indicated  by  the
synchronization source  (SSRC) option  and transport  address is  no  longer
active.  If a  bridge shuts down, it should  first send BYE options for  all
content sources it  handles, followed  by a  BYE option  with an  identifier
value of zero.   Each  RTCP message can  contain one or  more BYE  messages.
Multiple identifiers  in a  single  BYE option  are  not allowed,  to  avoid
ambiguities between the special value of zero and any necessary padding.



6.4 QOS: Quality of service measurement


  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |F|     QOS     |    length     |    reserved   |    reserved   |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                       packets expected                        |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                       packets received                        |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |    minimum delay (seconds)    |    minimum delay (fraction)   |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |    maximum delay (seconds)    |    maximum delay (fraction)   |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |    average delay (seconds)    |    average delay (fraction)   |


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 27]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  type = PORT  |    length     |   transport address          ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  type = ADDR  |    length     |    reserved   | address type  |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                     network-layer address                    ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


The QOS  option  conveys  statistics  of  a  single  synchronization  source
belonging to the channel  identified by the  multicast address,  destination
port and channel identifier.   The synchronization  source is identified  by
appending the first of the ADDR items  together with the PORT item from  the
SDES option.   These SDES  items are appended  directly to the  fixed-length
part of the QOS option, with PORT preceding ADDR. For a description of these
items, see the SDES option.

If the QOS option is used  in reverse control packets, the destination  port
number identifies the channel, along with the channel identifier.  For  that
reason, every  multicast group  should be  associated with  a unique  source
port.

The other fields of the option  contain the number of packets received,  the
number of packets  expected, the minimum  delay, the  maximum delay and  the
average delay.  The expected number of packets may be computed according  to
the algorithm in Section A.5.  The delay measures are in units of 1/65536 of
a second, that is,  with the same resolution as  the timestamp in the  fixed
RTP header.

A single RTCP packet  may contain several QOS  options.  It  is left to  the
implementor to decide how  often to transmit QOS  options and which  sources
are to be included.




7 Security Considerations


Without the  use of  the  security options  described  in section  5.4,  RTP
suffers from the same security deficiencies as the underlying protocols, for
example, the ability of  an impostor to fake  source or destination  network
addresses, or to change header or  payload without detection.  For  example,
the SDES fields may be used to impersonate another participant.

IP multicast provides no direct means for a sender to know all the receivers
of the  data sent.    RTP  options  make it  easy  for all  participants  in
a session  to identify  themselves;  if deemed  important for  a  particular
application, it  is the  responsibility of  the application  writer to  make
listening without identification difficult.   It  should be noted,  however,
that privacy of the payload can generally be assured only by encryption.


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 28]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

The periodic transmission of session messages may make it possible to detect
denial-of-service attacks, as the receiver  can detect the absence of  these
expected messages.

Unlike for other  data, ciphertext-only  attacks may be  more _________  for
compressed audio  and video  sources.   Such  data is  very close  to  white
noise, making  statistics-based ciphertext-only  attacks  difficult.    Even
without message  integrity  check  options,  it  may  be  difficult  for  an
attacker to  detect  automatically when  he  or  she has  found  the  secret
cryptographic key  since  the  bit  pattern  after  correct  decryption  may
not look  significantly different  from one  decrypted with  the wrong  key.
However, the session information is  more or less constant and  predictable,
allowing known-plaintext  attacks.    Chosen-plaintext  attacks  appear,  in
general, to be difficult.

The integrity of the timestamp in the  fixed RTP header can be protected  by
the message integrity options.  If  clocks are known to be synchronized,  an
attacker only has a very limited time window of maybe a few seconds every 18
hours to replay recorded RTP without detection by the receiver.

Key  distribution  and   certificates  are   outside  the   scope  of   this
document.



8 RTP over Network and Transport Protocols


This section  describes  issues  specific to  carrying  RTP  packets  within
particular network and transport protocols.


8.1 Defaults


The following rules apply unless superseded by protocol-specific subsections
in this section.  The rules apply to both forward and reverse RTP packets.

RTP packets contain no length field or other delineation, so that a  framing
mechanism is  needed  if  they  are carried  in  underlying  protocols  that
provide the  abstraction of  a continuous  bit stream  rather than  messages
(packets).  TCP is an  example of such a protocol.   Framing is also  needed
if the underlying  protocol may contain  padding so that  the extent of  the
RTP payload cannot  be determined.    For these  cases, each  RTP packet  is
prefixed by a 32-bit framing field  containing the length of the RTP  packet
measured in octets,  not including  the framing  field itself.    If an  RTP
packet traverses a path over a mixture of octet-stream and  message-oriented
protocols, each RTP-level bridge between these protocols is responsible  for
adding and removing the framing field.

A profile may determine that this framing method is to be used even when RTP


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 29]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

is carried in protocols that do  provide framing in order to allow  carrying
several RTP packets in  one lower-layer protocol  data unit,  such as a  UDP
packet.   Carrying several RTP  packets in one  network or transport  packet
reduces header overhead and  may simplify synchronization between  different
streams.



8.2 ST-II


When used in conjunction  with RTP, ST-II [11]  service access ports  (SAPs)
have a length of 16  bits.  The  next protocol field (``NextPCol'',  Section
4.2.2.10 in RFC 1190) is used to distinguish two encapsulations of RTP  over
ST-II. The first uses NextPCol value TBD and directly places the RTP  packet
into the ST-II data area.  If NextPCol value TBD is used, the RTP  header is
preceded by a 32-bit  header shown below.   The  octet count determines  the
number of octets  in the  RTP header  and payload to  be checksummed.    The
16-bit checksum uses the TCP and UDP checksum algorithm.


  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | count of octets to be checked |           check sum           |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |       RTP packet (fixed header, options and payload)         ...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


A Implementation Notes


We describe aspects of the receiver  implementation in this section.   There
may be other implementation methods that are faster in particular  operating
environments or have other advantages.   These implementation notes are  for
informational purposes only.

The  following  definitions  are  used  for  all  examples;   the  structure
definitions are valid for 32-bit big-endian architectures only.  Bit  fields
are assumed to be packed tightly, with no additional padding.



#include <sys/types.h>

typedef double CLOCK_t;

typedef enum {
  RTP_CSRC   = 0,
  RTP_SSRC   = 1,
  RTP_SDST   = 2,

H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 30]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

  RTP_BOS    = 3,
  RTP_ENC    = 8,
  RTP_MIC    = 9,
  RTP_MICA   = 10,
  RTP_MICK   = 11,
  RTP_MICS   = 12,
  RTP_FMT    = 32,
  RTP_SDES   = 34,
  RTP_BYE    = 35,
  RTP_QOS    = 36
} rtp_option_t;

typedef struct {
  unsigned int ver:2;      /* version number */
  unsigned int channel:6;  /* channel id */
  unsigned int o:1;        /* option present */
  unsigned int s:1;        /* sync bit */
  unsigned int format:6;   /* content type */
  u_short seq;             /* sequence number */
  u_long  ts;              /* time stamp */
} rtp_hdr_t;

typedef union {
  struct {
    int final:1;           /* final option */
    int type:7;            /* option type */
    u_char length;         /* length, including type/length */
    short id[1];
  } csrc;
  /* ... */
} rtp_t;



A.1 Timestamp Recovery


For some applications  it is  useful to  have the  receiver reconstruct  the
sender's high-order  bits of  the  NTP timestamp  from the  received  32-bit
RTP timestamp.    The following  code uses  double-precision floating  point
numbers for  whole numbers  with a  48-bit range.    Other type  definitions
of CLOCK_t may be  appropriate for different  operating environments,  e.g.,
64-bit architectures  or systems  with slow  floating point  support.    The
routine applies to any clock frequency, not just the RTP value of 65,536 Hz,
and any clock starting  point.  It  will reconstruct the correct  high-order
bits as long as the local clock  now is within one half of wrap-around  time
of the 32-bit timestamp, e.g., approximately 9.2 hours for RTP timestamps.



#include <math.h>


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 31]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

#define MOD32bit 4294967296.
#define MAX31bit 0x7fffffff

CLOCK_t clock_extend(ts, now)
u_long ts;    /* in: timestamp, low-order 32 bits */
CLOCK_t now;  /* in: current local time */
{
  u_long high, low;   /* high and low order bits of 48-bit clock */

  low  = fmod(now, MOD32bit);
  high = now / MOD32bit;

  if (low > ts) {
    if (low - ts > MAX31bit) high++;
  }
  else {
    if (ts - low > MAX31bit) high--;
  }
  return high * MOD32bit + ts;
} /* extend_timestamp */




Using the full timestamp internally has the advantage that the remainder  of
the receiver code does not have to be concerned with modulo arithmetic.  The
current local time  does not  have to be  derived directly  from the  system
clock for every packet; a clock  based on samples, e.g., incremented by  the
nominal audio  frame duration,  is sufficient.    The whole  seconds  within
NTP time stamps can  be obtained by  adding 2208988800 to  the value of  the
standard Unix  clock (generated,  for example,  by  the gettimeofday  system
call), which starts from the  year 1970.  For  the RTP time stamp, only  the
least significant 16 bits of the second are used.


A.2 Detecting the Beginning of a Synchronization Unit


RTP packets contain a bit flag indicating the end of a synchronization unit.
The following code  fragment determines,  based  on sequence numbers,  if  a
packet is the  beginning of a  synchronization unit.   It  assumes that  the
packet header has been converted to host byte order.


static u_long seq_eos;
rtp_hdr_t *h;
static int flag;

if (h->s) {
  flag    = 1;
  seq_eos = h->seq;
}

H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 32]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

/* handle wrap-around of sequence number */
else if (flag && (h->seq - seq_eos < 32768)) {
  flag = 0;
  /* handle beginning of synchronization unit */
}



A.3 Demultiplexing and Locating the Synchronization Source


The combination  of  destination  address,   destination  port  and  channel
identifier determines the channel.  For each channel, the receiver maintains
a list of all sources, content and synchronization sources alike, in a table
or other suitable data structure.   Synchronization sources are stored  with
a content source value of  zero.  When an  RTP packet arrives, the  receiver
determines its network  source address and  port (from information  returned
by the operating system), synchronization  source (SSRC option) and  content
source(s) (CSRC  option).    To locate  the  table entry  containing  timing
information, mapping from content descriptor  to actual encoding, etc.,  the
receiver sets the content source to zero and locates a table entry based  on
the triple (transport source address, and synchronization source identifier,
0).

The receiver identifies  the contributors to  the packet  (for example,  the
speaker who is  heard in  the packet) through  the list  of content  sources
carried in the CSRC option.   To locate the  table entry, it matches on  the
triple (network address and port, synchronization source identifier, content
source).

Note that  since  network  addresses  are  only  generated  locally  at  the
receiver, the receiver can choose whatever format seems most appropriate for
matching.  For example, a Berkeley Unix-based system may use struct sockaddr
data types if it expects network sources with non-IP addresses.


A.4 Parsing RTP Options


The following  code  segment  walks  through  the  RTP options,   preventing
infinite loops due to zero and invalid length fields.  Structure definitions
are valid for big-endian architectures only.


u_long len;       /* length of RTP packet in bytes */
u_long *pt;       /* pointer */
rtp_hdr_t *h;     /* fixed header */
rtp_t *r;         /* options */

if (h->o) {
  for (pt = (u_long *)(h+1);; pt += r->csrc.length) {


H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 33]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

    r = (rtp_t *)pt;

    /* invalid length field */
    if ((char *)pt - (char *)h > len || r->csrc.length == 0) return -1;

    switch(r->csrc.type) {
      case RTP_BYE:
        /* handle BYE option */
        break;
      case RTP_CSRC:
        /* handle CSRC option */
        break;

        /* ... */

      default:
        /* undefined option */
        break;
    }
    if (r->csrc.final) break;
  }
}



A.5 Determining the Expected Number of RTP Packets


The number of packets expected can  be computed by the receiver by  tracking
the first  sequence  number  received  (seq0),   the  last  sequence  number
received, seq, and the number of complete sequence number cycles:


expected = cycles * 65536 + seq - seq0 + 1;


The cycle count is updated for each packet, where seq_prior is the  sequence
number of the prior packet:



unsigned long seq, seq_prior;

if (seq - seq_prior > 65536)
  cycle++;
else if (seq - seq_prior > 32768)
  cycle--;

seq_prior = seq;




H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 34]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

Acknowledgments


This  memorandum  is  based  on  discussions  within  the  IETF  audio-video
transport working group chaired by Stephen Casner.  The current protocol has
its origins in  the Network  Voice Protocol  and the  Packet Video  Protocol
(Danny Cohen  and  Randy Cole)  and  the  protocol implemented  by  the  vat
application (Van  Jacobson and  Steve McCanne).    Stuart Stubblebine  (ISI)
helped with the security aspects of RTP. Ron Frederic (Xerox PARC)  provided
extensive editorial assistance.



B Addresses of Authors


Stephen Casner
USC/Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA 90292-6695
telephone:  +1 310 822 1511 (extension 153)
electronic mail:  casner@isi.edu


Henning Schulzrinne
AT&T Bell Laboratories
MH 2A244
600 Mountain Avenue
Murray Hill, NJ 07974-0636
telephone:  +1 908 582 2262
facsimile:  +1 908 582 5809
electronic mail:  hgs@research.att.com



References


 [1] D. E.  Comer, _______________  ____ ______, vol.  1. Englewood  Cliffs,
     New Jersey:  Prentice Hall, 1991.

 [2] J.  Postel, ``Internet protocol,''  Network Working  Group Request  for
     Comments RFC 791, Information Sciences Institute, Sept. 1981.

 [3] International  Standards   Organization,  ``ISO/IEC  DIS   10646-1:1993
     information technology -- universal multiple-octet coded  character set
     (UCS) -- part I: Architecture and basic multilingual plane,'' 1993.

 [4] The  Unicode Consortium,  ___ _______  ________.  New York,  New  York:
     Addison-Wesley, 1991.

 [5] D.  L. Mills,  ``Network time  protocol (version  3) --  specification,

H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 35]


INTERNET-DRAFT          draft-ietf-avt-rtp-03.txt         September 15, 1993

     implementation  and  analysis,''  Network  Working  Group  Request  for
     Comments RFC 1305, University of Delaware, Mar. 1992.

 [6] S.  Kent,  ``Understanding  the  Internet  certification system,''   in
     ___________  __ ___  _____________ __________  __________ ______,  (San
     Francisco, California),  pp. BAB--1 -- BAB--10, Internet Society,  Aug.
     1993.

 [7] D. Balenson, ``Privacy enhancement for internet electronic mail:   Part
     III:  Algorithms,  modes,  and  identifiers,''  Network  Working  Group
     Request for Comments RFC 1423, IETF, Feb. 1993.

 [8] V.  L. Voydock  and S.  T. Kent,  ``Security  mechanisms in  high-level
     network  protocols,'' ___  _________ _______,  vol. 15,  pp.  135--171,
     June 1983.

 [9] J. Kaliski,  Burton S., ``The  MD2 message-digest algorithm,''  Network
     Working Group  Request for  Comments RFC 1319,  RSA Laboratories,  Apr.
     1992.

[10] R. Rivest, ``The MD5 message-digest algorithm,'' Network  Working Group
     Request for Comments RFC 1321, IETF, Apr. 1992.

[11] C.  Topolcic, S.  Casner,  C. Lynn,  Jr.,  P.  Park, and  K.  Schroder,
     ``Experimental internet  stream protocol, version 2 (ST-II),''  Network
     Working  Group  Request   for  Comments  RFC  1190,  BBN  Systems   and
     Technologies, Oct. 1990.


























H. Schulzrinne/S. Casner             Expires 11/01/93              [Page 36]