INTERNET-DRAFT                                              John Lazzaro
June 28, 2002                                             John Wawrzynek
Expires: December 28, 2002                                   UC Berkeley



              The MIDI Wire Protocol Packetization (MWPP)

                 <draft-ietf-avt-mwpp-midi-rtp-04.txt>


Status of this Memo

This document is an Internet-Draft and is subject to all provisions of
Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups.  Note that other groups
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html

                                Abstract

     The MIDI Wire Protocol Packetization (MWPP) is a general-purpose
     RTP packetization for the MIDI command language. MWPP is suitable
     for use in both interactive applications (such as the pseudo-wire
     emulation of MIDI cables) and content-delivery applications (such
     as MIDI file streaming).  MWPP is designed for use over unicast and
     multicast UDP, and defines MIDI-specific resiliency tools for the
     graceful recovery from packet loss. In addition, a lightweight
     configuration option supports the efficient use of MWPP over TCP.
     MWPP-specific SDP parameters support the customization of MWPP
     behavior (including the MIDI rendering method) during session
     setup. MWPP is compatible with the MPEG-4 generic RTP payload
     format, to support the MPEG 4 Audio object types for General MIDI,
     DLS2, and Structured Audio.





Lazzaro/Wawrzynek                                               [Page 1]


INTERNET-DRAFT                                              28 June 2002


0. Change Log for <draft-ietf-avt-mwpp-midi-rtp-04.txt>

Most of the changes in -04 are editorial in nature. The document has
been reorganized, and many sections have been rewritten, to be an easier
read for MIDI experts who are new to the IETF multimedia standards. In
specific:

  o  Section 1 begins with an MWPP-centric introduction to the
     IETF multimedia suite, to show where MWPP fits into the
     world of RTP, RTSP, SIP, SDP, and negotiation via the
     offer/answer protocol.

  o  Section 2 systematically explains the MWPP semantics of
     every RTP header field, and in the process acts as a
     brief introduction to key RTP concepts. The maximum and
     minimum MWPP packet sizes are also discussed, with a brief
     diversion into why ROHC makes the 12-octet RTP header
     size a non-issue.

  o  The definitions of SDP parameters have been removed from
     MWPP packetization Sections 2-5, and placed in a set of
     Appendices (C.1-5).

  o  Section 6 has been rewritten, to show complete SDP stream
     descriptions for minimal mpeg4-generic and mwpp MWPP. In
     the process, it acts as a brief introduction to SDP itself.
     Section 6 also introduces the SDP parameters in the C.1-5
     Appendices, by describing how each parameter adds new
     features to the "minimal" streams.

Thanks to Dominique Foder, Phil Kerr, Chris Grigg, and Martijn Sipkema
for suggestions that led to this editorial overhaul. In addition, -04
includes the following changes from -03:

  o  In Section 3, the running status description emphasizes
     that System Exclusive messages cancels running status
     (thanks to Martijn Sipkema). The "dropped F7" construction
     now uses 0xF5 as the signal octet (thanks to Dominique Foder).

  o  The SDP rtpmap lines for mpeg4-generic and mwpp are now
     identical, apart from the MIME type names. The recj
     (Appendix C.1) and midiport (Appendix C.4) parameters
     code data formerly coded the mwpp rtpmap line.

  o  In Appendix A.1, new definitions for finished and
     unfinished commands.





Lazzaro/Wawrzynek                                               [Page 2]


INTERNET-DRAFT                                              28 June 2002


  o  In Appendix A.4, the coding method for Chapter N has
     been modified to support 128 concurrent NoteOn commands.
     In addition, the definition of the Y bit of a note log
     has been modified (thanks to Dominique Foder).

  o  The first part of Appendix B.3 now provides context for
     the coding method used in Chapter Q, and also provides a
     rationale for the QNOTE field (thanks to Dominique Foder).

  o  Appendix B.4 has been largely rewritten. It now provides
     context for the Chapter E coding method, and it explains
     the use of the D and POINT bits in decoding the PARTIAL
     field. In addition, it explains the transmission delay
     offset of the COMPLETE field, and it shunts unfinished
     MTC Full Frame commands to Chapter X (thanks to
     Dominique Foder).

  o  The first part of Appendix B.5 has been rewritten, to
     explain Chapter X behavior for segmented SysEx commands
     whose segments appear across several packets (thanks to
     Phil Kerr), and to include the coding of unfinished Full
     Frame commands into Chapter X.

  o  The SDP parameters for MIDI wire protocol timestamp
     semantics are now in Appendix C.2. The editorial text
     describing these parameters have been rewritten and
     expanded for clarity. The mperiod parameter now has
     the units of the RTP timestamp field (thanks to
     Dominique Foder).

  o  The MWPP description of the standard SDP parameter
     maxptime is rewritten for clarity, and is now referenced
     correctly. It resides in Appendix C.3 (thanks to
     Dominique Foder).

  o  The former rtpmap parameter midiport has now become the
     SDP parameter midiport, and its semantics have been
     extended to be more useful. It's new companion parameter,
     zerosync, handles MWPP applications that are not amenable
     to RTCP-based NTP synchronization.

  o  An extensible SDP parameter, render, defines the MIDI
     rendering a receiver uses to turn MIDI into audio. The
     memo fully defines one non-trivial value for render:
     sasc, a flexible method for specifying the
     AudioSpecificConfig() for mpeg4-generic, thereby
     supporting General MIDI, DLS2, and Structured Audio.
     Thanks to Jan van der Meer and Chris Grigg.



Lazzaro/Wawrzynek                                               [Page 3]


INTERNET-DRAFT                                              28 June 2002


1. Introduction

The Internet Engineering Task Force (IETF) has developed a set of
focused tools for multimedia networking ([2] [9] [10] [12]). These tools
can be combined in different ways to support a variety of real-time
applications over Internet Protocol (IP) networks.

For example, to support IP telephony, applications might use the Session
Initiation Protocol (SIP, [10]) to set up the phone call. Call setup
might include negotiations (using the SIP offer/answer protocol [11]) to
agree on a common audio codec.  These negotiations would use the Session
Description Protocol (SDP, [9]) to describe candidate codecs.  After the
call is set up, audio data would flow between the participants using the
Real Time Protocol (RTP, [2]) under the Audio/Visual Profile (RTP/AVT,
[3]).

The IETF tools used in this telephony example (SIP, SDP, RTP/AVT) might
be combined in a different way to support a content streaming
application, perhaps in conjunction with other tools (such as the Real
Time Streaming Protocol (RTSP, [12])).

This memo extends two of the IETF tools (RTP and SDP) to support the
Musical Instrument Digital Interface (MIDI) standard for musical
instrument control [1]. These extensions support both interactive
applications (such as low-latency emulation of MIDI cables) and content-
delivery applications (such as MIDI File streaming) over local-area and
wide-area IP networks.

We extend RTP by adding a new packetization, the MIDI Wire Protocol
Packetization (MWPP), to the AVT profile. We extend SDP by defining a
comprehensive set of MWPP-specific SDP parameters. These SDP parameters
support the configuration and negotiation of MIDI endpoint behaviors
using SIP, RTSP, and other session setup tools.


1.1 MWPP Overview

The first part of this memo (Sections 2-5) defines MWPP.

The MIDI standard defines a command set that describes sound as a series
of events (NoteOn command to start a musical note event, NoteOff command
to end a note, etc). MIDI commands execute on one of the 16 voice
channels (usually a voice channel is devoted to a single instrument
timbre) or on the special Systems channel.

MWPP layers a single MIDI command stream (16 voice channels + System
channel) onto an RTP stream. Alternatively, MWPP may also be layered
over the mpeg4-generic RTP packetization [4], to support the MPEG 4



Lazzaro/Wawrzynek                                               [Page 4]


INTERNET-DRAFT                                              28 June 2002


Audio object types [5] that use the Structured Audio [5], DLS2 [18], and
General MIDI [1] sound synthesis systems.

MWPP supports both of the command execution timing methods defined in
the MIDI standard: the implicit "time-of-arrival" code used in the MIDI
wire protocol (a networking standard for the remote control of musical
instruments over short asynchronous serial lines), and the explicit
timestamps of the MIDI File Format (a standard for representing complete
musical performances in off-line storage).

Section 2 of this memo introduces the modular design of MWPP
packetization.  The simplest form of MWPP uses the MIDI command section
(described in Sections 3) as a complete self-framed RTP payload. This
lightweight version of MWPP is suitable for use over reliable transport
such as TCP.

MWPP is also suitable for use over unreliable transport such as unicast
and multicast UDP. MWPP provides resiliency by inserting a recovery
journal section (described in Sections 4 and 5 and Appendices A.1-8 and
B.1-5) into each RTP packet. The recovery journal codes the recent
history of the stream.

1.2 MWPP-specific SDP Overview

The second part of this memo (Section 6 and Appendices C.1-5) extends
the Session Description Protocol (SDP, [9]) for use with MWPP, by
defining new SDP parameters. These parameters may be used to customize
the configuration of an MWPP session, by using SDP in conjunction with
session setup tools like SIP [10, 11] or RTSP [12].

The MWPP-specific SDP parameters provide tools for structuring multiple
MWPP streams (Appendix C.4), setting the resiliency configuration
(Appendix C.1), and customizing the MWPP timestamp semantics (Appendix
C.2).

In addition, an extensible SDP parameter, render, configures the method
of rendering the MIDI command stream into audio output (Appendix C.5).
For example, the render parameter value sasc may be used to select and
initialize the General MIDI [1], DLS2 [18], and Structured Audio [5]
synthesis systems.

1.3 Memo Scope Discussion

The scope of this memo is limited in several respects. This memo
normatively defines the syntax and semantics of the RTP and SDP MIDI
extensions. However, this memo does not define algorithms for sending
and receiving MWPP packets. Ancillary IETF documents (in preparation)
provide informative guidance on MWPP algorithms, as do related



Lazzaro/Wawrzynek                                               [Page 5]


INTERNET-DRAFT                                              28 June 2002


conference publications [6] [8] and software distributions [7].

The scope of this memo is also limited in that it defines MIDI
extensions for RTP and SDP, but it does not define profiles for using
RTP, SDP and other IETF tools in any specific MIDI application domain.
We expect other documents, from the IETF or from other organizations, to
define profiles that are based on MWPP, but this memo does not.


2. MWPP Packet Format.

RTP defines a media stream as a sequence of logical packets that share a
common format. The packet format consists of two parts: the RTP header,
whose syntax is independent of the stream media format, followed by the
packet payload, whose syntax is customized for the stream media format.
Figure 1 shows this format for MWPP RTP packets (vertical space
delineates the header from the payload).

We describe RTP packets as "logical" packets to highlight the fact that
RTP itself does not define a transport protocol. Instead, RTP packets
are mapped onto network protocols (such as unicast UDP, multicast UDP,
or TCP) by an application [13].

This section describes the MWPP RTP header fields and the MWPP payload
structure, in separate sub-sections.

 0 1 2 3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |        Sequence number        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           Timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             SSRC                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                             CSRCs                             |
|                              ...                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     MIDI command section ...                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Recovery journal ...                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Figure 1 -- MWPP packet format




Lazzaro/Wawrzynek                                               [Page 6]


INTERNET-DRAFT                                              28 June 2002


2.1 RTP Header

Although the RTP header syntax is standardized in [2], many aspects of
the semantics of RTP header fields may vary to meet the needs of the
payload media type. In this sub-section, we describe the RTP header
semantics for MWPP.

2.1.1 RTP Header Configuration Octets

Header octet 0 configures the packet layout. The V field codes the RTP
version number (currently 2).  The P bit signals the presence of padding
octets at the end of the packet.  The X bit indicates the presence of
payload-specific header extensions (MWPP defines no extensions, and so X
= 0).

The CC field is non-zero if an RTP packet codes the combined output of
multiple sources. The CC field codes the number of CSRC fields at the
end of the header (CSRC fields identify the sources contributing to the
combined output). For the typical MWPP case of a single source, CC is
set to 0, and no CSRC fields appear at the end of the RTP header.

For most uses of MWPP, header octet 0 sets V = 2 and P = X = CC = 0,
yielding a 12-octet RTP header size for every packet in the stream.
Some MWPP applications may find this header overhead unacceptable. RTP
header compression has been developed to adapt RTP for use in
environments where bandwidth is at a premium [14]; these standards
compress both the RTP header and the headers of network protocols (UDP,
IP, etc).

Header octet 1 contains the marker bit M and the payload type field PT.
MWPP sets the M bit to 1, to preserve compatibility with the
mpeg4-generic RTP packetization [4]. The 7-bit PT field codes a value
that indicates the payload type is MWPP. The numeric value that codes an
MWPP payload type is fixed during session configuration, using the SDP
rtpmap attribute (see Sections 6.1 and 6.2 for examples).

2.1.2 RTP Header Sequence Number

Header octets 2 and 3 code the RTP packet sequence number, a 16-bit
field interpreted as an unsigned integer (all integer fields in RTP and
MWPP are in the (big-endian) IETF network byte order). In MWPP, the RTP
sequence number increments by one (modulo 65536) for each packet sent in
the stream.  As is standard in RTP, the sequence number is initialized
to a randomly chosen value.

In this memo, we also refer to the 32-bit extended packet sequence
number, computed (by senders) or inferred (by receivers) by robustly
tracking rollovers of the 16-bit RTP sequence number.  Note that in a



Lazzaro/Wawrzynek                                               [Page 7]


INTERNET-DRAFT                                              28 June 2002


multicast environment, different receivers in the same session may infer
different extended packet sequence numbers, depending on when the
receiver joined the session.

2.1.3 RTP Header Timestamp

Header octets 4-7 code the RTP timestamp, a 32-bit field interpreted as
an unsigned integer. The RTP timestamp sets the base timestamp value for
the packet. The MWPP payload codes MIDI command timestamps relative to
this base timestamp value. If the MIDI command section of the MWPP
payload contains no MIDI commands, the RTP timestamp indicates the
instant the RTP packet was encoded.

The units for the RTP timestamp are fixed during session configuration,
using the SDP rtpmap parameter srate (see Section 6). For example, if
configuration sets srate to a value of 44100 Hz, two MWPP packets whose
base timestamp values differ by 2 seconds have RTP timestamp fields that
differ by 88200.

MWPP RTP timestamps do not necessarily increment at a fixed rate. The
timestamps for two sequential RTP packets may be identical, or the
second packet may have a timestamp arbitrarily larger than the first
packet (modulo 2^32). As is standard in RTP, the timestamp field is
initialized to a randomly chosen value.

MWPP defines the length of media time a packet encodes as the RTP
timestamp difference (modulo 2^32) between the packet's successor and
the packet itself. By default, the media time for a packet may be
arbitrarily long. However, a maximum media time for MWPP packets in a
stream may be set during session configuration, via the SDP parameter
maxptime (see Appendix C.3).

2.1.4 RTP Header SSRC

Header octets 8-11 form a unique 32-bit SSRC value that identifies the
sender of the RTP stream. These SSRC values are used to identify session
participants in the Real Time Control Protocol (RTCP, [2]), the
companion back-channel protocol to RTP.

RTCP lets senders and receivers exchange monitoring data about the
forward RTP streams. As described in Section 5 of this memo, RTCP fields
may be useful in implementations of the MWPP recovery journal system.

2.2 MWPP Payload

As shown in Figure 1, an MWPP packet may consist of two sections: the
MIDI command section and the recovery journal.




Lazzaro/Wawrzynek                                               [Page 8]


INTERNET-DRAFT                                              28 June 2002


The MIDI command section codes a (possibly empty) list of MIDI commands,
and thus provides the essential service of MWPP. The MIDI command
section is required to appear in the payload of every packet of a valid
MWPP stream.  Section 3 describes the internal structure of the MIDI
command section.

The recovery journal codes a recent history of the stream, to provide
resiliency. Sections 4-5 and Appendices A.1-8 and B.1-5 describe the
internal structure of the recovery journal.

By default, MWPP streams that use unreliable transport (such as UDP) use
the recovery journal, and MWPP streams that use reliable transport (such
as TCP) do not. The SDP parameter recj overrides this default behavior.
See Appendix C.1.1 for details.

If an MWPP stream uses the recovery journal, the recovery journal
section MUST appear in every packet in the stream. If an MWPP stream
does not use the recovery journal, a recovery journal section never
appears in a packet in the stream.

2.2.1 MWPP Payload Size

The MIDI command section and the recovery journal section both have
variable-length formats. The MIDI command section has a minimum length
of 1 octet and a maximum length of 16384 octets. The recovery journal
section has a minimum length of 3 octets and a maximum length of 17394
octets.

However, the practical maximum length of an MWPP RTP packet depends on
the RTP network protocol mapping. For example, if RTP logical packets
are mapped one-to-one to UDP IP packets, the Maximum Transmission Unit
(MTU) of the IP network sets the recommended maximum length of the
encapsulated MWPP RTP packet (IP header size + UDP header size + RTP
header size + MWPP payload size, if header compression is not in use).


2.2.2 MWPP Payload Namespace

An MWPP stream encodes MIDI content for a single MIDI command stream
namespace (16 MIDI voice channels + MIDI Systems). Per RTP rules, this
MWPP stream maps onto a single network flow (defined by a type of
transport plus a unique flow identifier, such as UDP IP on a certain IP4
address and a certain UDP port).

Some applications domains may have more complicated namespace
requirements. For example, an application may wish to send two
synchronized MIDI namespaces over RTP, to support 32 MIDI voice
channels. Or, as an alternative example, an application may wish to



Lazzaro/Wawrzynek                                               [Page 9]


INTERNET-DRAFT                                              28 June 2002


split the data of a single MIDI namespace over two network flows, to use
UDP for real-time data and TCP for bulk data.

The "RTP way" to address these requirements is to send several MWPP RTP
streams in the same session. The namespace and synchronization
relationships of multi-stream MWPP sessions are set up during session
configuration, via MWPP SDP fmpt parameters defined in Appendix C.4.

2.2.3 MWPP Payload Rendering

In many MIDI applications, the MIDI sender has some sort of model of the
method the receiver uses to render MIDI into audio (or sometimes, into
control actions such as the rewind of a tape deck or the dimming of
stage lights).

This rendering model may be standards-based, such as the General MIDI
[1], DLS2 [18], and Structured Audio [5] rendering models supported as
synthetic profiles in MPEG 4 [5]. Alternatively, the rendering model may
be proprietary, specifying that a particular hardware or software
synthesizer product is listening on a certain MIDI channel, and uses a
certain patch parameter set.

The rendering model for an MWPP stream is set up during session
configuration, via MWPP SDP fmpt parameters defined in Appendix C.5.
Once a session begins, the MWPP RTP stream may act to alter the
rendering model (for example, by using System Exclusive commands to
modify synthesizer patches). Alternatively, depending on the IETF
session initiation tool and the chosen MIDI rendering model, it may be
possible for sender to alter the rendering model during a session by
updating SDP parameters by via the session initiation tool.


3. MIDI Command Section

Figure 2 shows the format of the MIDI command section.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|B|Z| LEN ...  |          MIDI list ...                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                 Figure 2 -- MIDI command section

The MIDI command section begins with a variable-length header.  The
header field LEN codes the length (in units of octets) of the MIDI list
that follows the header.




Lazzaro/Wawrzynek                                              [Page 10]


INTERNET-DRAFT                                              28 June 2002


If the header flag B is 0, the header is one octet long, and LEN is a
6-bit field, supporting a maximum MIDI list length of 63 octets. If B is
1, the header is two octets long, and LEN is a 14-bit field, supporting
a maximum MIDI list length of 16383 octets.

A LEN value of 0 is legal, and codes an empty MIDI list.  If the MIDI
list is empty, the RTP timestamp indicates the instant the RTP packet
was encoded.

If LEN is nonzero, the MIDI list has the structure shown in Figure 3.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|    Delta Time 0 (if Z = 1)   |     MIDI Command 0 ...         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Delta Time 1          |     MIDI Command 1 ...         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Delta Time 2          |     MIDI Command 2 ...         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            .....                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Delta Time N          |     MIDI Command N ...         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                 Figure 3 -- MIDI list structure

If the header flag Z is 1, the MIDI list begins with a complete MIDI
command (MIDI Command 0) preceded by a delta time (Delta Time 0). If Z
is 0, the Delta Time 0 field is not present in the MIDI list, and MIDI
Command 0 has an implicit delta time of 0.  The MIDI list structure may
also optionally encode a list of N additional complete MIDI commands.
Each additional command is preceded by a delta time.

The MWPP delta time syntax is a modified form of the MIDI File delta
time syntax [1]. MWPP delta times use 1-4 octet fields to encode 32-bit
unsigned integers. Figure 4 shows the encoded and decoded forms of delta
times. Note that delta time values may be legally encoded in multiple
formats; for example, there are four legal ways to encode the zero delta
time (0x00, 0x8000, 0x800000, 0x80000000).











Lazzaro/Wawrzynek                                              [Page 11]


INTERNET-DRAFT                                              28 June 2002


  One-Octet Delta Time:

     Encoded form: 0ddddddd
     Decoded form: 00000000 000000000 00000000 0ddddddd

  Two-Octet Delta Time:

     Encoded form: 1ccccccc 0ddddddd
     Decoded form: 00000000 00000000 00cccccc cddddddd

  Three-Octet Delta Time:

     Encoded form: 1bbbbbbb 1ccccccc 0ddddddd
     Decoded form: 00000000 000bbbbb bbcccccc cddddddd

  Four-Octet Delta Time:

     Encoded form: 1aaaaaaa 1bbbbbbb 1ccccccc 0ddddddd
     Decoded form: 0000aaaa aaabbbbb bbcccccc cddddddd

              Figure 4 -- Decoding delta time formats

MWPP uses delta times to encode a timestamp for each MIDI command. The
timestamp for MIDI Command K is the summation (modulo 2^32) of the RTP
timestamp and decoded delta times 0 through K. All command timestamps in
a packet MUST be less than or equal to the RTP timestamp of the next
packet in the MWPP stream (modulo 2^32).

By default, a command timestamp indicates the execution time for the
command. The difference between two timestamps indicates the time delay
between the execution of the commands. This difference may be zero,
coding simultaneous execution. MIDI sources that use explicit command
timestamps, such as the MIDI file format, are simple to transcode into
MWPP streams using these default semantics.

MIDI command sources that use implicit command timing, such as the MIDI
wire protocol, must be annotated with timestamps as part of the MWPP
transcoding process. The hardware and systems environment for an
application may dictate a particular approach to timestamps, that may
not be a good fit for the default MWPP timestamp semantics. To address
this issue, the semantics of command timestamps may be customized during
session configuration, as described in Appendix C.2.

As a rule, each MIDI Command field in the MIDI list contains a complete
MIDI command, in the binary command format defined in the MIDI standard
[1]. In the remainder of this section, we describe exceptions to this
rule.




Lazzaro/Wawrzynek                                              [Page 12]


INTERNET-DRAFT                                              28 June 2002


The first MIDI channel command in the MIDI list MUST include a status
octet; running status coding, as defined in [1], may be used for all
subsequent MIDI channel commands in the MIDI list.  As in [1], System
Common and System Exclusive messages (0xF0 ... 0xF7) cancel running
status state, but System RealTime messages (0xF8 ... 0xFF) do not effect
running status state.

In the MIDI wire protocol [1], a System RealTime command may be embedded
inside of another "host" MIDI command.  This syntactic construction is
not supported in MWPP: a MIDI Command field in the MIDI list codes
exactly one complete MIDI command.

To encode an embedded System RealTime command, senders MUST extract the
command from its host, and code it in the MIDI list as a separate
command. The host command and System RealTime command SHOULD appear in
the same MIDI list. The delta time of the System RealTime command SHOULD
result in a command timestamp that encodes the System RealTime command
placement in its original embedded position.

Two methods are provided for encoding MIDI System Exclusive (SysEx)
commands in the MIDI list. A SysEx command may be encoded in a MIDI
Command field verbatim: an 0xF0 octet, followed by an arbitrary number
of data octets, followed by an 0xF7 octet.

Alternatively, a SysEx command may be encoded as multiple segments.  The
command is divided into two or more SysEx command segments; each segment
is encoded in its own MIDI Command field in the MIDI list.

MWPP supports segmentation in order to encode SysEx commands that encode
information in the temporal pattern of data octets; by encoding these
commands as a series of segments, each data octet is associated with a
delta time. Segmentation may also be useful in coding very large SysEx
commands across several RTP packets.

To segment a SysEx command, first partition its data octet list into two
or more sublists; each sublist must contain at least one data octet.  To
complete the segmentation, add status octets to the head and tail of
each sublist, as detailed in Figure 5. Figure 6 shows example
segmentations of a SysEx command.












Lazzaro/Wawrzynek                                              [Page 13]


INTERNET-DRAFT                                              28 June 2002


    -----------------------------------------------------------
   | Sublist Position |  Head Status Octet | Tail Status Octet |
   |-----------------------------------------------------------|
   |    first         |       0xF0        |       0xF0         |
   |-----------------------------------------------------------|
   |    middle        |       0xF7        |       0xF0         |
   |-----------------------------------------------------------|
   |    last          |       0xF7        |       0xF7         |
    -----------------------------------------------------------

           Figure 5 -- Command Segmentation Status Octets








































Lazzaro/Wawrzynek                                              [Page 14]


INTERNET-DRAFT                                              28 June 2002


  Original SysEx command:

     0xF0 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7

  A two-segment segmentation:

     0xF0 0x01 0x02 0x03 0x04 0xF0

     0xF7 0x05 0x06 0x07 0x08 0xF7

  A different two-segment segmentation:

     0xF0 0x01 0xF0

     0xF7 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7

  A three-segment segmentation:

     0xF0 0x01 0x02 0xF0

     0xF7 0x03 0x04 0xF0

     0xF7 0x05 0x06 0x07 0x08 0xF7

  The segmentation with the largest number of segments:

     0xF0 0x01 0xF0

     0xF7 0x02 0xF0

     0xF7 0x03 0xF0

     0xF7 0x04 0xF0

     0xF7 0x05 0xF0

     0xF7 0x06 0xF0

     0xF7 0x07 0xF0

     0xF7 0x08 0xF7


                   Figure 6 -- Example segmentations







Lazzaro/Wawrzynek                                              [Page 15]


INTERNET-DRAFT                                              28 June 2002


The relative ordering of SysEx command segments in a MIDI list must
match the relative ordering of the sublists in the original SysEx
command. Only System RealTime MIDI commands may appear between SysEx
command segments. If the command segments of a SysEx command are placed
in the MIDI lists of two or more RTP packets, the segment ordering rules
apply to the concatenation of all affected MIDI lists.

The MIDI wire protocol [1] permits a "dropped 0xF7" construction for
SysEx commands; in this coding method, the 0xF7 octet is dropped from
the end of the SysEx command, and the status octet of the next MIDI
command acts both to terminate the SysEx command and start the next
command. To encode this construction in MWPP, follow these steps:

  o  Determine the appropriate delta times for the SysEx command and
     the command that follows the SysEx command.

  o  Insert the "dropped" 0xF7 octet at the end of the SysEx command,
     to form the standard SysEx syntax.

  o  Code both commands into the MIDI list using the rules above.

  o  Replace the 0xF7 octet that terminates the verbatim SysEx
     encoding or the last segment of the segmented SysEx encoding
     with a 0xF5 command. This substitution informs the receiver
     of the original dropped 0xF7 coding.


4. Recovery Journal Overview

In this section we introduce the recovery journal, the MWPP resiliency
tool for unreliable transport. In Section 5, we define the bitfield
format for the recovery journal. In Appendix C.1, we describe SDP
parameters for recovery journal configuration.

A MIDI stream sent over unreliable MWPP is fragile. Consider an MWPP
stream in which one packet codes the start of a trumpet note (via a
NoteOn command in the MIDI command section) and a second packet codes
the end of the note (via a matching NoteOff command). If the second
packet is lost, the trumpet note sustains indefinitely.

One solution to loss recovery is to retransmit lost packets. MWPP over
TCP provides resiliency via packet retransmission (at a lower layer of
the network stack). However, in some MWPP applications packet
retransmission is undesirable. Retransmission adds latency, adding a
round-trip time for lost packets; if TCP is used, head-of-line blocking
latency is also an issue. Simple retransmission is also unsuitable for
multicast applications, due to scaling issues.




Lazzaro/Wawrzynek                                              [Page 16]


INTERNET-DRAFT                                              28 June 2002


A feed-forward approach to resiliency avoids retransmission by using
information encoded in the forward packet stream to guide loss recovery.
Consider this simple resiliency scheme for stuck notes: if a receiver
detects lost RTP packets via sequence number breaks, it issues NoteOff
commands for all active notes as a precaution.  This scheme solves the
problem of notes that sound forever, but the immediate effect on the
stream is jarring: the music stops.

The MWPP recovery journal system implements feed-forward resiliency in a
more graceful way. Each MWPP packet includes a special section (the
"recovery journal") that codes the recent history of the stream. Upon
detection of a packet loss, a receiver uses the recovery journal history
to guide the stream repair process, fixing long-term problems such as
stuck notes while minimizing audible artifacts.

The recovery journal does not code a literal history of the MIDI stream.
In general, it is not possible to reconstruct the lost MIDI command
stream from the recovery journal contents. Instead, the recovery journal
format codes only the information necessary for the graceful recovery
from packet loss. This coding strategy trades off generality for
bandwidth efficiency [6].

The recovery journal codes the history of the MWPP stream, back to an
earlier packet called the checkpoint packet. The size of this checkpoint
history (a precise term defined in Appendix A.1) is sent in each
recovery journal. A receiver is able to detect if the checkpoint history
is too shallow for a graceful recovery from a particular packet loss
incident.

A sender dynamically controls the size of the recovery journal by
choosing the checkpoint history depth. The sender does not have other
levers for dynamic control, because this memo normatively defines the
length and contents of the recovery journal, given the MIDI stream
contents and checkpoint history depth (static control of the recovery
journal structure is possible during session configuration, via SDP
parameters described in Appendix C.1). Receiver designers rely on the
normative nature of the journal definitions to devise recovery
algorithms, much as audio and video codecs designers rely on normative
bitstream definitions to act as a common media language.

Senders may choose a variety of open-loop schemes for choosing a
checkpoint history size for each packet: protection of a constant
increment of media time, protection of a constant number of packets,
maximization of protection for an average payload bandwidth, etc.  These
schemes share a common problem: if a receiver has sustained too many
consecutive lost packets, the checkpoint history of the recovery journal
may be too shallow, forcing the receiver to resort to an "ungraceful"
recovery method.



Lazzaro/Wawrzynek                                              [Page 17]


INTERNET-DRAFT                                              28 June 2002


A closed loop approach to checkpoint history management avoids this
problem. Senders monitor the last RTP packet received by each receiver,
via the "extended highest sequence number received" field in standard
RTCP RR packets [2]. If senders do not advance the checkpoint packet to
extended sequence number N until all receivers have received an MWPP
packet with extended sequence number M >= (N - 1), the depth of the
checkpoint history is sufficient for receivers to gracefully recover
from an arbitrary packet loss.

We define the term "guaranteed policy" to describe sending algorithms
that obey the M >= (N - 1) inequality for the checkpoint packet.  A
guaranteed policy MAY use the RTCP method described above to implement
its sending policy, or MAY use other means of direct feedback from
receivers. We reference the guaranteed policy in the definition of the
recovery journal bitfield format in Section 5.

The guaranteed policy is multicast compatible, as it may be implemented
via standard RTCP RR packets. However, the guarantee is only in effect
for a receiver if the sender is aware of the receiver in the session. In
practice, this limitation only impacts the start of a stream, as the RTP
standard provides several mechanisms for a receiver to sense that a
sender is aware of its presence.


5. Recovery Journal Format

This section introduces the structure of the recovery journal, and
defines the bitfields of recovery journal headers. Appendices A.2-8 and
B.1-5 complete the bitfield definition of the recovery journal; Appendix
A.1 provides normative definitions for common terms and bitfield
structures used throughout the recovery journal.

The recovery journal has a three-level structure:

  o Top-level header.

  o Channel and system journal headers. Encodes recovery
    information for a single MIDI channel (channel journal)
    and for all MIDI Systems commands (system journal).

  o Chapters. Describes recovery information for a single MIDI
    command type.

Figure 7 shows the top-level structure of the recovery journal.  A
recovery journals consists of a 3-octet header, optionally followed by a
system journal and a list of channel journals.





Lazzaro/Wawrzynek                                              [Page 18]


INTERNET-DRAFT                                              28 June 2002


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|A|G|Y|TOTCHAN|    Checkpoint Packet Seqnum   |     ...       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   ... System journal ...      |  Channel journals ...         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure 7 -- Top-level recovery journal format

If the Y bit is set to 1, a system journal follows the recovery journal
header. If the A bit is set to 1, the recovery journal ends with a list
of (TOTCHAN + 1) channel journals. If A and Y are both zero, the
recovery journal only contains the 3-octet header, and is considered to
be an "empty" journal.

The S (single-packet loss) bit appears in most recovery journal
structures. It helps receivers efficiently parse the recovery journal in
the common case of the loss of a single packet.  Appendix A.1 defines S
bit semantics.

The 16-bit Checkpoint Packet Seqnum field codes the sequence number of
the checkpoint packet for this journal. The choice of the checkpoint
packet sets the depth of the recovery journal history, as defined in
Appendix A.1.

If the choice of the checkpoint packet adheres to the guaranteed policy
defined in Section 4, the G ("guaranteed") bit SHOULD be set to 1. If
the choice of the checkpoint packet does not adhere to the guaranteed
policy, the G bit MUST be set to 0.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S| CHAN  |R|      LENGTH       |P|W|N|A|T|C|M|R|  Chapters ... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure 8 -- Channel journal format

Figure 8 shows the structure of a channel journal: a 3-octet header,
followed by a list of leaf elements called channel chapters. A channel
journal encodes information about MIDI commands on the MIDI channel
coded by the 4-bit CHAN header field.

The 10-bit LENGTH field codes the length of the channel journal; the R
bit is reserved. The semantics for LENGTH and R fields are uniform
throughout the recovery journal, and are defined in Appendix A.1.




Lazzaro/Wawrzynek                                              [Page 19]


INTERNET-DRAFT                                              28 June 2002


The third octet of the channel journal header is the Table of Contents
(TOC) of the channel journal. The TOC is a set of bits that encode the
presence of a chapter in the journal. Each chapter contains information
about a certain class of MIDI channel command:

   o  Chapter P: MIDI Program Change (0xC)
   o  Chapter W: MIDI Pitch Wheel (0xE)
   o  Chapter N: MIDI NoteOff (0x8), NoteOn (0x9)
   o  Chapter A: MIDI Poly Aftertouch (0xA)
   o  Chapter T: MIDI Channel Aftertouch (0xD)
   o  Chapter C: MIDI Control Change (0xB)
   o  Chapter M: MIDI Parameter System (part of 0xB)

Chapters appear in a list following the header, in order of their
appearance in the TOC. Appendices A.1-8 describe the bitfield format for
each chapter, and define the conditions under which a chapter type MUST
appear in the recovery journal. If any chapter types are required for a
channel, an associated channel journal MUST appear in the recovery
journal.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|D|V|Q|E|X|      LENGTH       |  System chapters ...          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure 9 -- System journal format

Figure 9 shows the structure of the system journal: a 2-octet header,
followed by a list of system chapters.  System chapters code information
about a specific class of MIDI Systems command:

   o  Chapter D: Song Select (0xF3), Tune Request (0xF6), Reset (0xFF)
   o  Chapter V: Active Sense (0xFE)
   o  Chapter Q: Sequencer State (0xF2, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC)
   o  Chapter E: MTC Tape Position (0xF1, 0xF0 0x7F 0xcc 0x01 0x01)
   o  Chapter X: System Exclusive (all other 0xF0)

If header bits D, V, Q, or E are set to 1, one chapter for each chapter
type whose associated bit is set appears in a list following the header.
The chapter ordering follows the ordering of chapter header bits in the
header bitfield. If header bit X is set to 1, one or more Chapter X
bitfields appear at the end of the chapter list.

Appendices B.1-5 describe the bitfield format for the system chapters,
and define the conditions under which a chapter type MUST appear in the
recovery journal. If any system chapter type is required to appear in
the recovery journal, the system journal MUST appear in the recovery



Lazzaro/Wawrzynek                                              [Page 20]


INTERNET-DRAFT                                              28 June 2002


journal.


6. MWPP and the Session Description Protocol

RTP is a standard for the transport of media streams, but RTP does not
perform session management for the streams it carries. Instead, RTP is
designed to work together with tools that perform session management,
such as the Session Initiation Protocol (SIP, [10]) and the Real Time
Streaming Protocol (RTSP, [12]).

RTP interacts with session management tools via another standard, the
Session Description Protocol (SDP, [9]). SDP is a textual format for
specifying session descriptions. A session description is an ordered
list of declarative statements (or "lines").

A session description maps each RTP stream in the session to a network
transport (for example, unicast UDP at a certain IP number and port
number), and defines the numeric value of the PT field in the RTP header
for the stream. A session description also maps each RTP stream to a
media encoding, and may carry configuration parameters for the media
encoding.

Session management tools like SIP and RTSP coordinate the exchange of
complete session descriptions between session participants.  The
exchange protocol may by unilateral in nature: a sender proposes a
session description, which a receiver must accept in order to join the
session. Alternatively, some exchange protocols, like the SIP
offer/answer model [11], specify negotiation methods, in which the
proposal and acceptance/rejection of session descriptions are components
of the negotiation process.

In this section of the memo, we show how to create session descriptions
for MWPP streams. Sub-section 6.1 shows the session description format
for MWPP streams that layer directly onto RTP. Sub-section 6.2 shows the
session description format for MWPP streams that layer onto the
mpeg4-generic RTP packetization [4].

In sub-section 6.3, we introduce methods for enhancing these minimal
session descriptions, to better support real-world MWPP applications.
Sub-section 6.3 acts as a guide to the definitions of MWPP SDP
parameters that appear in Appendix C.1-5.

6.1 Session Description for MWPP over RTP

In this section, we show the session description syntax for MWPP streams
that layer directly onto RTP. For simplicity, we specialize the syntax
for unicast UDP transport. See [15] for the syntax for reliable TCP and



Lazzaro/Wawrzynek                                              [Page 21]


INTERNET-DRAFT                                              28 June 2002


TLS transport, and see [9] for the syntax for unreliable multicast UDP
transport.

The minimal SDP stream description consists of three lines: a media (m=)
line, a connection data line (c=), and an rtpmap attribute line
(a=rtpmap). The media line acts to bind the UDP port number to a RTP
payload type and has the syntax:

m=audio <port number> RTP/AVP <payload type>

The connection line sets the IP number for the RTP stream, and has the
syntax:

c=IN IP4 <IP number>

The rtpmap line maps the payload number type to the MIME type for the
stream, and has the syntax:

a=rtpmap: <payload number> <mime-type>/<srate>[/<audio-channels>]

The MIME type for MWPP over RTP is mwpp.

The rtpmap line also sets the sample rate and the number of audio
channels.  For many MWPP applications, the <audio-channels> field is
irrelevant or redundant; we include it here for compatibility reasons.
Note that the square brackets around <audio-channels> indicates it is an
optional field; the default value for <audio-channels> is 1 (mono).

We now show an example session description.  To set up an MWPP stream
over unicast UDP, at port 5004 on IP number 169.229.60.64, we use these
three SDP lines:

m=audio 5004 RTP/AVP 96
c=IN IP4 169.229.60.64
a=rtpmap: 96 mwpp/44100

In this example, each packet in the stream has an RTP header PT field
value of 96 (see Section 2.1.1 for details). The sample rate for the RTP
timestamp is 44100 Hz (see Section 2.1.3 for details). The RTP stream
flows from sender to receiver over UDP port 5004. If the Real Time
Control Protocol (RTCP) is in use, a second unicast UDP stream flowing
from receiver to sender appears on port 5005. The low-bandwidth RTCP
stream carries information about the reception quality of the forward
channel (see [2] for details).







Lazzaro/Wawrzynek                                              [Page 22]


INTERNET-DRAFT                                              28 June 2002


We describe this session description as minimal, because it does not
customize the stream.  Without such customization, an MWPP over RTP
session description has these default characteristics:

  1. If the stream uses unreliable transport (unicast UDP, multicast
     UDP, ...) the recovery journal system is in use, and the RTP
     payload contains both the MIDI command section and the recovery
     journal section. If the stream uses reliable transport (TCP,
     TLS, ...), the recovery journal system is not in use, and the
     RTP payload contains only the MIDI command section. See
     Section 2.2 for details.

  2. If the stream uses the recovery journal system, the format of
     the recovery journal is exactly as defined in Sections 4 and 5
     and Appendices A.1-8 and B.1-5 of this memo.

  3. In the MIDI command section of the payload, the command
     timestamps are interpreted as the command execution time, using
     the default semantics described in Section 3.

  4. An RTP packet does not have a defined maximum media time, and
     so the timestamp difference between adjacent packets in the
     stream may be arbitrarily large. See Section 2.1.3 for details.

  5. If more than one mwpp stream appears in a session description,
     the MIDI namespaces for these streams are independent: channel
     1 in the first stream does not reference the same MIDI channel
     as channel 1 in the second stream. In addition, the RTP timestamp
     fields for the streams do not necessarily share the same
     random offset value (see Section 2.1.3), and thus synchronization
     of the streams must use the generic RTP tools defined in [2].

  6. The session description does not specify the MIDI rendering method
     to be used with the stream.

In Section 6.3, we introduce SDP parameters to customize these
characteristics, via the inclusion of fmpt lines into the session
description.













Lazzaro/Wawrzynek                                              [Page 23]


INTERNET-DRAFT                                              28 June 2002


6.2 Minimal Session Description for MWPP over mpeg4-generic

In this section, we show the SDP syntax to define an MWPP stream that is
layered onto the mpeg4-generic RTP payload [4]. The mpeg4-generic
layering supports MWPP stream rendering via one of the MPEG 4 Audio
codecs that supports MIDI synthesis [5]:

  o General MIDI (Object Profile ID 14). This profile renders
    the MIDI stream using the General MIDI standard [1].

  o Wavetable Synthesis (Object Profile ID 13). This profile renders
    the MIDI stream using the DLS2 standard [18]. The session
    description includes the RIFF file to initialize the wavetable
    synthesis engine.

  o Main Synthetic (Object Profile ID 12). This profile renders
    the MIDI stream using Structured Audio [5], an algorithmic
    synthesis system based on the programming language SAOL. The
    session description includes the SAOL program and associated
    data.

A minimal mpeg4-generic session description uses the same media line,
connection line, and rtpmap line format as the session description
described in Section 6.1. The only difference is that the media line
uses mpeg4-generic instead of mwpp as the MIME type.

However, a minimal mpeg4-generic MWPP session description also sets the
value of several SDP parameters, using fmpt lines, to configure
mpeg4-generic. Two of these parameters (mode and streamtype) must be set
to specific constant values to create a legal mpeg4-generic MWPP
session. We show the proper initialization for these parameters in the
fmpt line below:

a=fmpt: <payload number> streamtype=5; mode=mwpp;

A third required parameter, profile-level-id, takes on the value 74 for
Main Synthetic (Object Profile ID 12), 75 for Wavetable Synthesis
(Object Profile ID 13), and 76 for General MIDI (Object Profile ID
xxxx14).

A fourth required parameter, config, is set to a double-quoted
hexadecimal string representation of the AudioSpecificConfig() binary
data block. Note that the format for AudioSpecificConfig() is shown in
[16]. For the Main Synthetic or Wavetable Synthesis profiles,
AudioSpecificConfig() codes the system initialization data (DLS2
samples, SAOL programs, etc).





Lazzaro/Wawrzynek                                              [Page 24]


INTERNET-DRAFT                                              28 June 2002


The config parameter may also be set to the empty string.  This value
indicates that MWPP-specific SDP parameters code the
AudioSpecificConfig() data, as defined in Appendix C.5.1.

We now show an example mpeg4-generic session description. To set up a
minimal MWPP stream for mpeg4-generic to drive General MIDI (Object
Profile ID 14), we use the following four lines:

m=audio 5004 RTP/AVP 61
c=IN IP4 169.229.60.64
a=rtpmap: 61 mpeg4-generic/44100
a=fmpt: 61 streamtype=5; mode=mwpp; config="e4"; profile-level-id=76;

Each packet in the stream has an RTP header PT field value of 61 (see
Section 2.1.1 for details). The sample rate for the RTP timestamp is
44100 Hz (see Section 2.1.3 for details).

The profile-level-id value of 76 informs the receiver to render the MIDI
stream using the General MIDI object type. The config value is a
hexadecimal string encoding of the short AudioSpecificConfig() used by
General MIDI.

The RTP stream flows from sender to receiver over unicast UDP, at port
5004 on IP number 169.229.60.64.  If the Real Time Control Protocol
(RTCP) is in use, a second unicast UDP stream flowing from receiver to
sender appears on port 5005. The low-bandwidth RTCP stream carries
information about the reception quality of the forward channel (see [2]
for details).

We describe this session description as minimal, because it defines the
SDP parameters that are required for mpeg4-generic operation, but does
not customize the stream via additional SDP parameters.

In Section 6.1, we describe the behavior of a minimal MWPP stream that
is sent directly over RTP, as a numbered list of characteristics.
Characteristics 1-4 on that list also describe the minimal MWPP session
layered onto mpeg4-generic, but characteristics 5 and 6 require
restatement for MWPP over mpeg4-generic, as listed below:

  5. If more than one mpeg4-generic stream in mode mwpp appears in
     a session description, each stream denotes an independent
     instance of an MPEG 4 synthesizer of the object type coded in
     the profile-level-id parameter. In addition, the RTP timestamp
     fields for the streams do not necessarily share the same
     random offset value (see Section 2.1.3), and thus synchronization
     of the streams must use the generic RTP tools defined in [2].





Lazzaro/Wawrzynek                                              [Page 25]


INTERNET-DRAFT                                              28 June 2002


  6. The size of the encoded AudioSpecificConfig() string for
     the config parameter must abide by the size restrictions of
     the IETF tool that manages the mpeg4-generic stream. For some
     tools, like SIP over UDP [10], the config value string size
     might be limited to about 1500 octets or less. Many real-world
     AudioSpecificConfig() blocks encode into config value strings
     that are larger than 1500 octets.

In Section 6.3, we introduce SDP parameters to customize these
characteristics, via the inclusion of fmpt lines into the session
description.

6.3 MWPP SDP Parameters

This section introduces optional MWPP session description parameters, to
add MWPP functionality beyond the minimal streams described in Sections
6.1 and 6.2. In this section, we briefly discuss the purpose of each
parameter, and reference the Appendix C sub-section that contains the
complete parameter description.

To use an optional parameter in a session description, include an fmpt
line to set the parameter value, in the position mandated by [9]. The
syntax for fmpt lines appears below (see Section 6.2 for usage
examples).

a=fmpt: <payload number> <param1>=<value1>; <param2>=<value2>; ...

The MWPP optional parameters provide several distinct sets of services:

  o  Recovery journal customization. The recj parameter configures
     the presence or absence of a recovery journal in a stream.
     The chmay, chnever, and chmust parameters configure the
     chapter types that appear in the recovery journal. These
     parameters are described in Appendix C.1, and override
     the default stream behaviors 1 and 2 listed in Section 6.1
     and referenced in Section 6.2.

  o  MIDI command timestamp semantics. The tsmode, octpos,
     mperiod, and linerate parameters customize the semantics
     of the timestamps that label commands in the MIDI command
     section. These parameters let MWPP accurately encode the
     implicit time coding of the MIDI wire protocol. These
     parameters are described in Appendix C.2, and override
     default stream behavior 3 listed in Section 6.1 and
     referenced in Section 6.2






Lazzaro/Wawrzynek                                              [Page 26]


INTERNET-DRAFT                                              28 June 2002


  o  Media time limits. The standard SDP parameter maxptime
     sets the maximum media time of an MWPP RTP packet, and
     as a consequence imposes a minimum sending rate for MWPP.
     This feature benefits algorithms performing clock-skew
     compensation, network latency estimation, and packet loss
     recovery. This parameter is described in Appendix C.3, and
     overrides default stream behavior 4 listed in Section 6.1
     and referenced in Section 6.2.

  o  Multiple streams. The midiport SDP parameter supports mapping
     multiple MWPP streams to the same MIDI namespace (for
     the mwpp media type) or to the same instance of an MPEG 4
     object type (for the mpeg4-generic media type in mode mwpp).
     The zerosync SDP parameter provides an alternative way to
     synchronize multiple MWPP streams. These parameters are
     described in Appendix C.4, and override default stream
     behavior 5 in Sections 6.1 and 6.2.

  o  MIDI rendering. An extensible set of SDP parameters supports
     the specification of the MWPP rendering method, for both
     MWPP over RTP and MWPP over mpeg4-generic streams. These
     parameters are described in Appendix C.5 and override default
     stream behavior 6 in Sections 6.1 and 6.2.




























Lazzaro/Wawrzynek                                              [Page 27]


INTERNET-DRAFT                                              28 June 2002


7. Security Considerations

Cryptographic authentication of incoming RTP and RTCP packets is highly
recommended when using MWPP. Without such protections, attackers could
forge MIDI commands into an ongoing streams, potentially damaging
speakers and eardrums. An attacker could also craft RTP and RTCP packets
to exploit known bugs in the client, and take effective control of a
client machine.

The session management tool should also use cryptographic authentication
on all session descriptions, as spoofed AudioSpecificConfig() data
blocks are a second powerful point of entry for attackers.

The zerosync SDP parameter (described in Appendix C.4.2) impairs a
security feature of RTP. In standard RTP, the RTP timestamp is
initialized to a randomly chosen value, to reduce the predictability of
RTP header values. If the zerosync SDP parameter is used with a non-zero
value in a stream description, and a plain-text session description is
snooped, an attacker knows the randomly chosen RTP timestamp offset for
the stream.

If the zerosync SDP parameter is used with a zero value for several
stream descriptions in a session, all of these streams use the same
randomly chosen RTP offset, and so an attacker may find this offset
value is easier to determine.

The sasc rendering value for the SDP render parameter (defined in
Appendix C.5.1) supports the inclusion of AudioSpecificConfig() data by
reference, using the url parameter. If this url is spoofed, an attacker
could change the session configuration in an arbitrary way, and thus
forge an attack on the MPEG 4 client.


8. Congestion Control

MWPP has congestion control issues that are unique for an RTP audio
packetization. In certain applications such as network musical
performance [6], the packet rate is linked to the gestural rate of a
human performer.

MWPP implementations SHOULD sense the MIDI wire protocol stream for
command patterns that result in excessive packet rates, and filter these
streams as part of MWPP to reduce the packet rate.








Lazzaro/Wawrzynek                                              [Page 28]


INTERNET-DRAFT                                              28 June 2002


9. Acknowledgements

We thank the networking, media compression, and computer music community
members who have contributed to the MWPP standardization effort,
including Steve Casner, Robin Davies, Dominique Fober, Philippe Gentric,
Chris Grigg, Phil Kerr, Young-Kwon Lim, Jan van der Meer, Colin Perkins,
Larry Rowe, Dave Singer, Martijn Sipkema, and Giorgio Zoia.












































Lazzaro/Wawrzynek                                              [Page 29]


INTERNET-DRAFT                                              28 June 2002


Appendix A. The Recovery Journal Channel Chapters


Appendix A.1. Recovery Journal Definitions

In this Appendix, we define the terminology and the coding idioms that
are used in the recovery journal bitfield descriptions in Section 5
(journal header structure), Appendices A.2-8 (channel journal chapters)
and Appendices B.1-5 (system journal chapters).

These descriptions assume that the recovery journal resides in an RTP
packet with sequence number I ("packet I") and that the Checkpoint
Packet Seqnum field in the top-level recovery journal header refers to a
packet with sequence number C. Sequence number algorithms defined for
the recovery journal system use modulo 2^16 arithmetic.

Several bitfield coding idioms appear throughout the recovery journal
system, with consistent semantics. Most recovery journal elements begin
with an "S" (Single-packet loss) bit. S bits are designed to help
receivers efficiently parse through the recovery journal hierarchy in
the common case of the loss of a single packet.

The default value of the S bit is 1. An S bit for a recovery journal
element in packet I is set to 0 if the element encodes data about a MIDI
command stored in the MIDI command section of packet I - 1. If an
element has its S bit set to 0, all higher-level recovery journal
elements that contain it also have S bits that are set to 0, including
the top-level recovery journal header (Figure 7 in Section 5).

Other coding idioms that appear with consistent semantics throughout the
recovery journal system are described below.

  o R flag bit. R flag bits are reserved for future use by MWPP.
    Sender MUST set R bits to 0; receivers MUST ignore R bit values.

  o LENGTH field. All fields named LENGTH (as distinct from LEN)
    code the number of octets in the structure that contains it,
    including the header it resides in and all hierarchical levels
    below it. This definition simplifies parsing, as receivers may
    skip over the entire structure with an addition operation.

We now define normative terms used to describe recovery journal
semantics.

  o Checkpoint history. The checkpoint history of a recovery journal
    is the concatenation of the MIDI command sections of packets C
    through I - 1. The last MIDI command in MIDI command section for
    packet I - 1 is considered the most recent command; the first



Lazzaro/Wawrzynek                                              [Page 30]


INTERNET-DRAFT                                              28 June 2002


    MIDI command in the MIDI command section for packet C is
    the oldest command. A checkpoint history with no MIDI commands
    is considered to be empty. The checkpoint history never contains
    the MIDI Command section of the packet I (the packet containing
    the recovery journal), so if C == I, the checkpoint history is
    empty by definition.

  o Session history. The session history of a recovery journal is
    the concatenation of MIDI command sections from the first
    packet of the session up to packet I - 1. The definitions of
    MIDI command recency and history emptiness are the same as in
    the checkpoint history. The session history never contains the
    MIDI command section of packet I, and so the session history of
    the first packet in the session is empty by definition.

  o Finished/unfinished commands. If all octets of a MIDI command
    appear in the session history, the command is defined to be
    finished. If some but not all octets of a MIDI command appear
    in the session history, the command is defined to be unfinished.
    Unfinished commands occur if segments of a SysEx command appear
    in several RTP packets. For example, if a SysEx command is coded
    as 3 segments, with segment 1 in packet K, segment 2 in packet
    K + 1, and segment 3 in packet K + 2, the session histories for
    packets K + 1 and K + 2 contain unfinished versions of the command.

  o Active commands (default). For most types of MIDI commands,
    an active MIDI command is defined to be a MIDI command that does
    not appear before one of the following MIDI commands in the session
    history:  System Reset (0xFF), General MIDI System Enable
    (0xF0 0x7E 0xcc 0x09 0x01 0xF7), General MIDI System Disable
    (0xF0 0x7E 0xcc 0x09 0x00 0xF7). A few types of MIDI commands
    use a modified meaning of active (see below).

  o Active commands (NoteOn, Noteoff, Poly Aftertouch). For MIDI NoteOn,
    NoteOff, and Poly Aftertouch commands, an active MIDI command is
    defined to be a MIDI command that does not appear before one of the
    following MIDI commands in the session history: System Reset (0xFF),
    General MIDI System Enable (0xF0 0x7E 0xcc 0x09 0x01 0xF7), General
    MIDI System Disable (0xF0 0x7E 0xcc 0x09 0x00 0xF7), MIDI Control
    Change number 120 (All Notes Off) or 124 (All Sound Off).

  o Active commands (MIDI Control Change). For MIDI Control Change
    commands, an active MIDI command is defined to be a MIDI command
    that does not appear before one of the following MIDI commands in
    the session history: System Reset (0xFF), General MIDI System Enable
    (0xF0 0x7E 0xcc 0x09 0x01 0xF7), General MIDI System Disable
    (0xF0 0x7E 0xcc 0x09 0x00 0xF7), MIDI Control Change number 121
    (All Controllers Off).



Lazzaro/Wawrzynek                                              [Page 31]


INTERNET-DRAFT                                              28 June 2002


The chapter definitions in Appendices A.2-8 and B.1-5 reflect the
default recovery journal behavior of MWPP. The chmay, chmust, and
chnever SDP parameters modulate these definitions, as described in
Appendix C.1.2.

Finally, we note that channel journals only encode information about
MIDI commands appearing on the MIDI channel the journal protects. All
references to MIDI commands in Appendices A.2-8 should be read as "MIDI
commands appearing on this channel."


Appendix A.2. Chapter P: MIDI Program Change

A channel journal MUST contain Chapter P if an active Program Change
(0xC) command appears in the checkpoint history.  Figure A.2.1 shows the
format for Chapter P.

         0                   1                   2
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |S|   PROGRAM   |C| BANK-COARSE |F| BANK-FINE   |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure A.2.1 -- Chapter P Format

The chapter has a fixed size of 24 bits.  The PROGRAM field indicates
the program value of the most recent Program Change command in the
checkpoint history.

By default, bits 8-23 of Chapter P are set to 0.  However, if an active
Control Change (0xB) command for controller 0 (Bank Select Coarse)
appears before this Program Change command in the session history, the C
bit is set to 1, and the BANK-COARSE field is set to the 7-bit data
value for the most recent Control Change command for controller 0. The F
bit and BANK-FINE field code the Control Change command for controller
32 (Bank Select Fine) in an identical manner.















Lazzaro/Wawrzynek                                              [Page 32]


INTERNET-DRAFT                                              28 June 2002


Appendix A.3. Chapter W: MIDI Pitch Wheel

A channel journal MUST contain Chapter W if an active MIDI Pitch Wheel
(0xE) command appears in the checkpoint history.  Figure A.3.1 shows the
format for Chapter W.

                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                |S|     FIRST   |R|    SECOND   |
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure A.3.1 -- Chapter W Format

The chapter has a fixed size of 16 bits.  The FIRST and SECOND fields
are the 7-bit values of the first and second data octets of the most
recent active Pitch Wheel command in the checkpoint history.


Appendix A.4. Chapter N: MIDI NoteOff and NoteOn

In this Appendix, we consider NoteOn commands with zero velocity to be
NoteOff commands.

A channel journal MUST contain Chapter N if an active MIDI NoteOn (0x9)
or NoteOff (0x8) command appears in the checkpoint history. Figure A.4.1
shows the format for Chapter N.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |B|     LEN     |  LOW  | HIGH  |S|   NOTENUM   |Y|  VELOCITY   |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |S|   NOTENUM   |Y|  VELOCITY   | ....                          |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |   BITFIELD    |   BITFIELD    |     ....      |   BITFIELD    |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Figure A.4.1 -- Chapter N Format

Chapter N codes the most recent active NoteOn or NoteOff reference to a
MIDI note number in the checkpoint history.  Chapter N consists of a
2-octet header, followed by least one of the following data structures:

   o A list of note logs to code NoteOn commands.
   o A NoteOff bitfield structure to code NoteOff commands.





Lazzaro/Wawrzynek                                              [Page 33]


INTERNET-DRAFT                                              28 June 2002


The note log list MUST contain an entry for all note numbers whose most
recent checkpoint history appearance is in a NoteOn command.  The
NoteOff bitfield structure MUST contain a set bit for all note numbers
whose most recent checkpoint history appearance is in a NoteOff command.
A note number is never coded in both structures.

The header for Chapter N, reproduced in Figure A.4.2, codes the size of
the note list and bitfield structures.

                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                |B|     LEN     |  LOW  | HIGH  |
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure A.4.2 -- Chapter N Header

The 7-bit LEN field codes the number of 2-octet note logs in the note
list. Zero is a valid value for LEN, and codes the empty note list.  A
LEN value of 127 serves double duty, coding a note list length of 128
note logs (if LOW = 0xF and HIGH = 0x0) or 127 note logs (for any other
LOW/HIGH combination). This mechanism supports the unlikely, but legal,
condition of 128 concurrent NoteOn commands, one for each note number.

The 4-bit LOW and HIGH fields code the number of NoteOff bitfield octets
that follow the note log list. LOW and HIGH are unsigned integer values.
If LOW is less that or equal to HIGH, there are (HIGH - LOW + 1) NoteOff
bitfield octets in the chapter. An empty NoteOff bitfield structure is
coded by setting LOW to 15 and HIGH to 0 or 1.

The B bit is set to 1 if the MIDI command section of packet I - 1 does
not include a NoteOff command for this channel. The B bit, like the S
bit (Appendix A.1), helps receivers efficiently parse recovery journals
in the common case of the loss of a single packet.

We now describe the 2-octet note log structure, reproduced in Figure
A.4.3.

                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                |S|   NOTENUM   |Y|  VELOCITY   |
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure A.4.3 -- Chapter N Note Log

The 7-bit NOTENUM field codes the note number for the log; a note number
may not be represented by multiple note logs in the note list.  The



Lazzaro/Wawrzynek                                              [Page 34]


INTERNET-DRAFT                                              28 June 2002


7-bit VELOCITY field codes the velocity value for the most recent NoteOn
command for the note number in the checkpoint history. VELOCITY is never
zero; NoteOn commands with zero velocity are coded as NoteOff commands
in the NoteOff bitfield structure.

The note log does not code the execution time of the NoteOn command.
However, the Y bit codes a hint from the sender about the NoteOn
execution time. This hint takes the form of a recommendation to play (Y
= 1) or skip (Y = 0) a recovered NoteOn command from this log.  More
specifically, Y is set to 1 if the NoteOn command coded by the note log
is considered to be simultaneous with the RTP timestamp of the packet
than contains the note log. The metric used to judge simultaneity is
implementation dependent.

We now describe the NoteOff bitfield structure.  A NoteOff bitfield
octet codes NoteOff information for eight consecutive MIDI note numbers,
with the MSB representing the lowest note number. The MSB of the first
bitfield octet codes the note number 8*LOW; the MSB of the last bitfield
octet codes the note number 8*HIGH.

A set bit codes a NoteOff command for the note number; Chapter N does
not code NoteOff velocity data.  In the most efficient coding for the
NoteOff bitfield structure, the first and last octets of the structure
contain at least one set bit.


Appendix A.5. Chapter A: MIDI Poly Aftertouch

A channel journal MUST contain Chapter A if an active Poly Aftertouch
(0xA) command appears in the checkpoint history.  Figure A.5.1 shows the
format for Chapter A.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |S|    LEN      |S|   NOTENUM   |R|  PRESSURE   |S|   NOTENUM   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |R|  PRESSURE   |  ....                                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure A.5.1 -- Chapter A format

The chapter consists of a 1-octet header, followed by a variable length
list of 2-octet note logs. A note log MUST appear for a note number if
an active Poly Aftertouch command for the note number appears in the
checkpoint history.  A note number may not be represented by multiple
note logs in the note list.




Lazzaro/Wawrzynek                                              [Page 35]


INTERNET-DRAFT                                              28 June 2002


The 7-bit LEN field codes the number of note logs in the list, minus
one. Figure A.5.2 reproduces the note log structure of Chapter A.

                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                |S|   NOTENUM   |R|  PRESSURE   |
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure A.5.2 -- Chapter A Note Log

The 7-bit PRESSURE field codes the pressure value of the most recent
Poly Aftertouch command in the checkpoint history. The MIDI note number
for this command is coded in the 7-bit NOTENUM field.


Appendix A.6. Chapter T: MIDI Channel Aftertouch

A channel journal MUST contain Chapter T if an active MIDI Channel
Aftertouch (0xD) command appears in the checkpoint history.  Figure
A.6.1 shows the format for Chapter T.

                        0
                        0 1 2 3 4 5 6 7
                       +-+-+-+-+-+-+-+-+
                       |S|   PRESSURE  |
                       +-+-+-+-+-+-+-+-+

                Figure A.6.1 -- Chapter T Format

The chapter has a fixed size of 8 bits.  The 7-bit PRESSURE field holds
the pressure value of the most recent active Channel Aftertouch command
sent on this channel.


















Lazzaro/Wawrzynek                                              [Page 36]


INTERNET-DRAFT                                              28 June 2002


Appendix A.7. Chapter C: MIDI Control Change

A channel journal MUST contain Chapter C if an active Control Change
(0xB) command appears in the checkpoint history (excepting controller
numbers 0, 6, 32, 38, 96, 97, 98, 99, 100, and 101). In certain cases
(defined later in this Appendix) this rule also applies to the excepted
controller numbers. Figure A.7.1 shows the format for Chapter C.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |S|     LEN     |S|   NUMBER    |A|  VALUE/ALT  |S|   NUMBER    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |A| VALUE/ALT   |  ....                                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure A.7.1 -- Chapter C format

The chapter consists of a 1-octet header, followed by a variable length
list of 2-octet controller logs.  The list MUST contain an entry for a
controller number if an active Control Change command for the number
appears in the checkpoint history (excepting numbers 0, 6, 32, 38, 96,
97, 98, 99, 100, 101, 124, 125, 126, and 127). In certain cases (defined
later in this Appendix) this rule also applies to the excepted
controller numbers.

The 7-bit LEN field codes the number of controller logs in the list,
minus one.  A controller number may not appear in multiple controller
logs in the list. Figure A.7.2 reproduces the controller log structure
of Chapter C.

                 0                   1
                 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                |S|    NUMBER   |A|  VALUE/ALT  |
                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             Figure A.7.2 -- Chapter C Controller Log

The 7-bit NUMBER field identifies the controller number. The 7-bit
VALUE/ALT field codes recovery information for the most recent Control
Change command for this number in the checkpoint history.

Chapter C provides three tools for coding recovery information for a
command in the VALUE/ALT field: the value tool, the toggle tool, and the
count tool. Implementations may choose among the tools to code a Control
Change command.




Lazzaro/Wawrzynek                                              [Page 37]


INTERNET-DRAFT                                              28 June 2002


In the value tool, the 7-bit VALUE field codes the control value of the
most recent Control Change command for this controller number.  This
tool works best for controllers that code a continuous quantity, such as
number 1 (Modulation Wheel). If the value tool is chosen, the A bit is
set to 0.

The A bit is set to 1 to code the toggle or count tool. These tools work
best for controllers that code discrete actions.  Figure A.7.3 shows the
controller log for these tools.

                0                   1
                0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |S|    NUMBER   |1|T|    ALT    |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

          Figure A.7.3 -- Controller Log for ALT tools

The T flag is set to 1 to code the toggle tool; T is set to 0 to code
the count tool. Both methods use the 6-bit ALT field as an unsigned
integer.

The toggle tools works best for controllers that act as on/off switches,
such as 64 (Hold Pedal). These controllers code the "off" state with
control values 0-63 and the "on" state with 64-127. The ALT field codes
the total number of toggles (off->on and on->off) due to Control Change
commands in the session history. Toggle counting is performed modulo 64,
and the controller is assumed to be off at the start of a session.

The Hold Pedal controller illustrates the benefit of the toggle tool
over the value tool for switch controllers. As often used in piano
applications, the "on" state of the Hold Pedal lets notes resonate,
while the "off" state immediately damps notes to silence. The loss of
the "off" command in an "on->off->on" sequence results in ringing notes
that should have been damped silent.  The toggle tool lets receivers
detect this lost "off" command but the value tool does not.

The count tool is similar to the toggle tool, but is optimized for
controllers whose value octet is ignored, such as 120 (All Notes Off).
For the count tool, the ALT field codes the total number of Control
Change commands in the session history. Command counting is performed
modulo 64, and the command count is set to 0 at the start of the
session.

We now describe normative coding rules for the controller numbers that
are excepted from the general rules presented in the beginning of this
Appendix. For each excepted controller number, we define the conditions
under which a control log MUST appear in Chapter C for the controller



Lazzaro/Wawrzynek                                              [Page 38]


INTERNET-DRAFT                                              28 June 2002


number. By extension, these conditions imply that Chapter C MUST appear
in the recovery journal.

If active Control Change commands for controller numbers 0 (Bank Select
Coarse) or 32 (Bank Select Fine) appear in the checkpoint history, the
most recent commands for these numbers MUST appear as entries in the
controller list only if the data value for these commands are not coded
in the BANK-COARSE (0) or BANK-FINE (32) fields of the Chapter P
(Appendix A.2) for the channel journal. This rule avoids redundant
coding in Chapters C and P.

Several controller numbers pairs are defined to be mutually exclusive.
Controller numbers 124 (Omni Off) and 125 (Omni On) form a mutually
exclusive pair, as do controller numbers 126 (Mono) and 127 (Poly).  If
active Control Change commands for one or both members of a mutually
exclusive pair appear in the checkpoint history, one and only one
controller log MUST appear in controller list to code the pair.  This
controller log MUST code the controller number of the most recent
Control Change command of the pair.

Appendix A.8 defines Chapter M, the MIDI Parameter chapter, to provide
resiliency for the MIDI registered/non-registered parameter system.
Here, we define the Chapter C rules for coding Control Change commands
related to the registered/non-registered parameter system. These Chapter
C rules serve to minimize redundancy with Chapter M.

Control Change commands for controller numbers 6 and 38 (Data Slider)
and 96 and 97 (Data Button) may be used as part of the parameter system,
or may be used as general-purpose controllers. Control Change commands
for controller numbers 6, 38, 96, or 97 that appear in the checkpoint
history, and that are used in the parameter system, MUST NOT appear as
entries in the controller list.

However, if active Control Change commands for controller numbers 6, 38,
96, or 97 appear in the checkpoint history, and these commands are used
as general-purpose controllers, the most recent general-purpose command
instance for these numbers MUST appear as entries in the controller
list.

A parameter system transaction begins with paired Control Change
commands for numbers 98 and 99 (Non-Registered Parameter LSB and MSB) or
100 and 101 (Registered Parameter LSB and MSB). Chapter M codes these
paired Control Change commands. The Chapter C rule below acts to code
"unpaired" commands for these controller numbers, that appear in the
checkpoint history if a (98, 99) or (100, 101) pair is split across the
MIDI command sections of two MWPP packets.





Lazzaro/Wawrzynek                                              [Page 39]


INTERNET-DRAFT                                              28 June 2002


If the most recent active Control Change command for controller 98, 99,
100, or 101 in the checkpoint history is part of a (98, 99) or (100,
101) command pair that begins a parameter system transaction, the
command MUST NOT appear in the controller list. However, if the most
recent active Control Change command for controller 98, 99, 100, or 101
in the checkpoint history does not form part of a (98, 99) or (100, 101)
command pair, an entry MUST appear in the controller list.


Appendix A.8. Chapter M: MIDI Parameter System

A channel journal MUST contain Chapter M if an active Control Change
command that forms part of an initiated parameter system transaction (as
defined below) appears in the checkpoint history.

We begin by defining the terms "parameter system", "parameter system
transaction", and "initiated parameter system transaction" as used in
the Appendix.

  o  Parameter system. This phrase refers to a MIDI feature that
     provides two sets of 16,384 parameters to augment the
     Control Change controller number space. Registered Parameter
     Names (RPN) system and the Non-Registered Parameter Names
     (NRPN) system each provides 16,384 parameters.

  o  Parameter system transaction. The value of RPNs and NRPNs are
     changed by a series of Control Change commands that form a
     transaction. A transaction begins with two Control Change
     commands to set the parameter number (controller numbers
     98 and 99 for NRPNs, controller numbers 100 and 101 for RPNs).
     The transaction continues with an arbitrary number of
     Data Entry (controller numbers 6 and 38) and Data Button
     (controller numbers 96 and 97) Control Change commands to
     set the parameter value. The transaction ends with a second
     pair of (98, 99) or (100, 101) Control Change commands. These
     terminal commands are considered a part of the transaction.
     In addition, the terminal commands may start a second
     parameter system transaction; in this case, these commands
     belong to two transactions.

  o  Initiated parameter system transaction. An initiated parameter
     system transaction is a transaction whose (98, 99) or (100, 101)
     initial active Control Change command pair appears in the session
     history. Under certain conditions, unpaired active Control Change
     commands for controller numbers 98, 99, 100, or 100 are coded in
     Chapter C, as described in Appendix A.7.

Figure A.8.1 shows the variable-length format of Chapter M.



Lazzaro/Wawrzynek                                              [Page 40]


INTERNET-DRAFT                                              28 June 2002


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |S|P|N|R|R|R|      LENGTH       |  Transaction log list ...     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure A.8.1  Top-level Chapter M format

Chapter M consists of a 2-octet header, followed by list of transaction
log entries. The 10-bit LENGTH field codes the length of Chapter M, and
conforms to semantics described in Appendix A.1.

If an active Control Change command that forms part of an initiated
parameter system transaction appears in the checkpoint history, a log
entry for the transaction MUST appear in the transaction list.

The relative order of transaction list entries MUST reflect the relative
position of parameter transactions in the session history: the first log
entry codes the most recent parameter transaction in the history, the
second log entry codes a transaction that appears before the first
parameter transaction in the history, etc.

The P header bit is set to 1 if an active Control Change command pair to
terminate the first RPN transaction in the log list does not appear in
the session history. The N header bit has the same role for the first
NRPN transaction in the log list.

Figure A.8.2 shows the structure of a transaction log.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |S|T|       PARAM-NUMBER        |     KEY       |  DATA   ...   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   ...         |      KEY      |   DATA ...                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

              Figure A.8.2  Transaction Log Structure

The transaction log consists of a 2-octet header, followed by a
compressed enumeration of the Control Change commands for controller
numbers 6, 38, 96, and 97 for this transaction in the session history.
The presence of Control Change commands to terminate the transaction log
are coded implicitly by the P and N header bits of the top-level chapter
format (Figure A.8.1).

A transaction log header codes the parameter identity. If T is set to 1,
the log codes an NRPN parameter; if T is set to 0, the log codes an RPN



Lazzaro/Wawrzynek                                              [Page 41]


INTERNET-DRAFT                                              28 June 2002


parameter. The 14-bit PARAM-NUMBER header field codes the parameter
number.

The KEY and DATA fields that follow log header encode the compressed
enumeration of the Control Change commands for numbers 6, 38, 96, and
97. The ordering of this enumeration matches the ordering of commands in
the transaction: the first transaction command appears as the first
command in the enumeration, the second transaction command appears as
the second command in the enumeration, etc.

KEY and DATA fields always appear in pairs in the transaction log; at
least one KEY-DATA pair MUST appear in a transaction log, even if no
Control Change commands need to be coded. The KEY field has a fixed
1-octet size, and acts as a directory for the KEY-DATA pair; the DATA
fields has a variable size of 0-3 octets. Figure A.8.3 shows the format
of the KEY octet.

                        0
                        0 1 2 3 4 5 6 7
                       +-+-+-+-+-+-+-+-+
                       |S|M|IN1|IN2|IN3|
                       +-+-+-+-+-+-+-+-+

                   Figure A.8.3 -- Key Octet

The two-bit fields IN1, IN2, and IN3 code the appearance and meaning of
the first, second, and third DATA octet that may follows the KEY octet.
The IN fields code the following information:

  o  IN_k = 00. The DATA octet for this position is not present. The
     permitted placements of the 00 value are: IN1 = IN2 = IN3 = 00
     (no DATA octets follow the KEY octet), IN2 = IN3 = 00 (one DATA
     octet follow the KEY octet), IN3 = 00 (two DATA octets follow the
     KEY octet).

  o  IN_k = 01. Indicates an active Control Change command for
     controller number 6 (Data Entry Slider Coarse); the DATA
     octet codes the third octet of the Control Change command.

  o  IN_k = 02. Indicates an active Control Change command for
     controller number 38 (Data Entry Slider Fine); the DATA
     octet codes the third octet of the Control Change command.

  o  IN_k = 03. Indicates one or more active Control Change commands
     for controller number 96 (Data Button Increment) and/or 97
     (Data Button Decrement), without an intervening Control Change
     command 6 or 38.The DATA octet codes the cumulative effect of the
     Data Button commands, as a two's complement 8-bit value:



Lazzaro/Wawrzynek                                              [Page 42]


INTERNET-DRAFT                                              28 June 2002


     controller 96 commands increment the value by 1, controller
     97 commands decrement the value by 1.

The M flag is 1 if another KEY octet follows the DATA octet(s). If M is
0, another transaction log may follow the DATA octet(s), or the DATA
octet(s) may mark the end of Chapter M, depending on the LENGTH field of
the top-level Chapter M header shown in Figure A.8.1.

In comparison with other recovery journal chapters, Chapter M is
inefficient: each transaction for a parameter number in the checkpoint
history is listed in the transaction list, and each Control Change
command for a transaction is enumerated in a transaction log. This
design decision trades off recovery journal size for design simplicity.
In practice, parameter system commands rarely appear in MIDI streams,
and this design decision does not have a significant impact on MWPP
bandwidth requirements.



































Lazzaro/Wawrzynek                                              [Page 43]


INTERNET-DRAFT                                              28 June 2002


Appendix B. The Recovery Journal System Chapters


Appendix B.1. System Chapter D: Reset, Song Select, Tune Request

The system journal MUST contain Chapter D if an active MIDI Reset
(0xFF), MIDI Tune Request (0xF6), or MIDI Song Select (0xF3) command
appears in the checkpoint history.  Note that General MIDI reset
commands are coded in Chapter X (Appendix B.5), not in Chapter D.
Figure B.1.1 shows the variable-length format for Chapter D.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |S|E|T|G|R|R|R|R|  Command logs ...                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

               Figure B.1.1 -- System Chapter D Format

The chapter consists of a 1-octet header, followed by one or more
command logs. Header flag bits indicate the presence of command logs for
the Reset (E = 1), Tune Request (T = 1), and Song Select (G = 1)
commands. Command logs appear in a list following the header, in the
order that their flag bits appear in the header.

Figure B.1.2 shows the 1-octet command log format for the Reset and Tune
Request commands.

                         0
                         0 1 2 3 4 5 6 7
                        +-+-+-+-+-+-+-+-+
                        |S|    COUNT    |
                        +-+-+-+-+-+-+-+-+

       Figure B.1.2 -- Command Log for Reset and Tune Request

Chapter D MUST contain the Reset command log if an active Reset command
appears in the checkpoint history. The 7-bit COUNT field codes the total
number of Reset commands (modulo 128) present in the session history.

Chapter D MUST contain the Tune Request command log if an active Tune
Request command appears in the checkpoint history. The 7-bit COUNT field
codes the total number of Tune Request commands (modulo 128) present in
the session history.

Figure B.1.3 shows the 1-octet command log format for the Song Select
command.




Lazzaro/Wawrzynek                                              [Page 44]


INTERNET-DRAFT                                              28 June 2002


                         0
                         0 1 2 3 4 5 6 7
                        +-+-+-+-+-+-+-+-+
                        |S|    VALUE    |
                        +-+-+-+-+-+-+-+-+

           Figure B.1.3 -- Song Select Command Log Format

Chapter D MUST contain the Song Select command log if an active Song
Select command appears in the checkpoint history. The 7-bit VALUE field
codes the song number of the most recent Song Select command in the
checkpoint history.


Appendix B.2. System Chapter V: Active Sense Command

The system journal MUST contain Chapter V if an active MIDI Active Sense
(0xFE) command appears in the checkpoint history.  Figure B.2.1 shows
the format for Chapter V.

                         0
                         0 1 2 3 4 5 6 7
                        +-+-+-+-+-+-+-+-+
                        |S|    COUNT    |
                        +-+-+-+-+-+-+-+-+

               Figure B.2.1 -- System Chapter V Format

The 7-bit COUNT field codes the total number of Active Sense commands
(modulo 128) present in the session history.







Appendix B.3. System Chapter Q: Sequencer State Commands

This Appendix describes Chapter Q, the system chapter for the MIDI
sequencer commands.

The system journal MUST contain Chapter Q if an active MIDI Song
Position Pointer (0xF2), MIDI Clock (0xF8), MIDI Tick (0xF9), MIDI Start
(0xFA), MIDI Continue (0xFB) or MIDI Stop (0xFC) command appears in the
checkpoint history. Note that MIDI Tick, a relatively recent addition to
the MIDI standard [1], is a seconds-based alternative to MIDI Clock.
Figure B.3.1 shows the variable-length format for Chapter Q.



Lazzaro/Wawrzynek                                              [Page 45]


INTERNET-DRAFT                                              28 June 2002


 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|N|D|C|T|Q|TOP|          CLOCK                |   TICKS       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      ...                       |             QNOTE            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  ...          |
+-+-+-+-+-+-+-+-+

               Figure B.3.1 -- System Chapter Q Format

Unlike most chapters, Chapter Q does not provide resiliency by coding
log entries for individual MIDI commands. Instead, Chapter Q captures
the cumulative effect of all sequencer commands in the session history,
by encoding the most recent sequencer system state. This coding strategy
yields an efficient chapter design: the minimal Chapter Q configuration
fits is 3 octets.

In a temporal sense, the fields of Chapter Q reflect system state up to
(but not including) the moment encoded by the RTP timestamp of the
packet in which it resides (packet I, as defined in Appendix A.1).  In
normal operation, a receiver examines Chapter Q after a packet loss
episode, in order to re-synchronize its open-loop estimation of the
sequencer state. Chapter Q state information includes the position of
the sequencer pointer (coded by the CLOCK and/or TICKS field), the
presence of the downbeat (the D bit) and the on/off state of the
sequencer (the N bit).

In addition, Chapter Q may optionally code an estimate of the current
tempo may be coded in the QNOTE field. QNOTE helps loss recovery in two
ways. If the sequencer is running, a tempo estimate may help a receiver
re-synchronize faster. If the sequencer is stopped, QNOTE tracks tempo
changes in the MIDI Clock or MIDI Tick stream; this information helps
receivers smoothly react if a Start or Continue command appears soon
after a packet loss episode.

We now state the normative definition of the Chapter Q bitfields.
Chapter Q consists of a 1-octet header followed by several optional
fields, in the order shown in Figure B.3.1.  Three header bits (C, T,
and Q) indicate the presence of fields following the header.  Two header
bits (N and D) encode aspects of the sequencer system state directly.

Header flag bits C, T, and Q signal the presence of the 16-bit CLOCK
field (C set to 1), the 24-bit TICKS field (T set to 1) and the 24-bit
QNOTE field (Q set to 1).





Lazzaro/Wawrzynek                                              [Page 46]


INTERNET-DRAFT                                              28 June 2002


The N header bit encodes the relative occurrence of the Start, Continue
and Stop commands in the session history.  If an active Start or
Continue command appears most recently, N is set to 1.  If an active
Stop appears most recently, or if no active instances of these commands
appear in the session history, N is set to 0.

The D header bit encodes the presence of the downbeat.  If N is set to
1, D is set to 1 if at least one Clock or Tick command follows the most
recent Start or Continue command in the session history. If this
condition does not hold, or if N is 0, then D is set to 0.

If N is set to 0 (coding a stopped sequence), or if N is set to 1 and D
is set to 0 (coding a sequence on the verge of beginning), Chapter Q
MUST encode the starting song position of the sequence. The C and T
header flags, the optional CLOCK (if C is set to 1) and TICKS (if T is
set to 1) fields, and the TOP header field, act to code the starting
song position, via the methods described below.

   o If C = 0 and T = 0, the starting song position is at the
     beginning of the song.

   o If C = 1 and T = 0, the 2-bit TOP header field and the 16-bit
     CLOCK field are combined to form the 18-bit unsigned quantity
     65536*TOP + CLOCK. This value encodes the starting song
     position, in units of clocks (24 clocks per quarter note).
     Use this method if the MIDI source uses Clock commands as
     timing pulses.

   o If C = 0 and T = 1, the 24-bit TICKS field codes the starting
     song position, in units of milliseconds. Use this method
     if the MIDI source uses Tick commands as timing pulses
     (10 ms per Tick). The song position MUST be encoded using
     sub-Tick (i.e. sub-10ms) resolution.

   o If C = 1 and T = 1, the starting song position is the sum of
     the positions encoded by the CLOCK, TOP and TICKS fields, as
     described above. Used this method if the MIDI stream
     uses Tick commands as timing pulses and also uses the
     clock-based Song Position Pointer commands to reposition
     the sequence.

If the N and D header bits are both set to 1, the sequence is playing,
and Chapter Q MUST encode the current song position in the sequence.
The current song position is coded using the same fields and methods as
the starting song position (see above). If the TICKS field is used to
code the current song position, the field value counts time up to the
moment encoded by the RTP timestamp of packet I.




Lazzaro/Wawrzynek                                              [Page 47]


INTERNET-DRAFT                                              28 June 2002


Chapter Q MAY encode an estimate of the current tempo, by setting the Q
header bit to 1, and placing the estimated tempo value in the 24-bit
QNOTE field. The QNOTE field has units of microseconds per quarter note.
This memo does not define a normative algorithm for tempo estimation for
the QNOTE field.  Note that Q may be set to 1 even if N is set to 0,
providing a method for coding current tempo while the sequence is
stopped.



Appendix B.4. System Chapter E: MIDI Time Code Tape Position

This Appendix describes Chapter E, the system chapter for the MIDI Time
Code (MTC) commands.

The system journal MUST contain Chapter E if an active MIDI System
Common Quarter Frame command (0xF1) or an active finished System
Exclusive (Universal Real Time) MTC Full Frame command (F0 7F cc 01 01
hr mn sc fr F7) appears in the checkpoint history.

Unfinished MTC Full Frame commands are coded in Chapter X, as described
in Appendix B.5. See Appendix A.1 for definitions of finished and
unfinished MIDI commands.

Figure B.4.1 shows the variable-length format for Chapter E.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|Q|C|P|D|POINT|                COMPLETE                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 PARTIAL                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


               Figure B.4.1 -- System Chapter E Format

This Appendix contains two sub-sections. B.4.1 is an informative
description of the Chapter E design; B.4.2 is the normative definition
of the Chapter E bitfield semantics.

B.4.1  Informative Description of Chapter E

The MIDI standard uses MTC to tag a particular moment in the MIDI stream
with a SMPTE timestamp (a frame-based timestamp standard for video and
film). In a typical application, a receiver uses these SMPTE timestamps
to synchronize the playback of a video tape deck with the MIDI stream.




Lazzaro/Wawrzynek                                              [Page 48]


INTERNET-DRAFT                                              28 June 2002


MTC provides two methods for sending a SMPTE timestamp. The simple
method, the Full Frame command, encodes the entire timestamp in a
10-octet System Exclusive command. Alternatively, the timestamp value
may be transmitted incrementally, via 8 one-octet Quarter Frame commands
sent at regular intervals over two video frames.

Chapter E encodes SMPTE recovery information derived from MTC commands
that appear in the session history. In normal operation, a receiver
examines Chapter E after a packet loss episode, in order to re-
synchronize its open-loop estimation of the current SMPTE time.

Chapter E may hold two SMPTE timestamps. The 24-bit COMPLETE field,
present if the C bit is set, codes the most recent complete MTC
timestamp that appears in the session history. This timestamp may be
coded by one finished Full Frame command or 8 Quarter Frame commands. If
the COMPLETE field codes data from Quarter Frame commands, the COMPLETE
field value is two frames ahead of the timestamp encoded in the Quarter
Frame commands, to compensate for the transmission delay of the
incremental Quarter Frame code.

Chapter E may also contain a 24-bit PARTIAL field, that codes the
timestamp data fragments coded by an incomplete Quarter Frame sequence.
The P bit signals the presence of the PARTIAL field. The D, Q, and POINT
fields hold ancillary data that is essential for decoding the meaning of
the PARTIAL field.

B.4.2  Normative Definition of Chapter E

Chapter E holds information about the most recent MIDI Time Code (MTC)
tape position coded in the session history. Chapter E consists of a
1-octet header followed by two optional fields (COMPLETE and PARTIAL) in
the order shown in Figure B.4.1. The 24-bit COMPLETE field is present if
header bit C is set to 1; the 24-bit PARTIAL field is present if header
bit P is set to 1.

MTC tape position updates in the session history may occur atomically,
via a finished Full Frame command, or incrementally, via a series of
Quarter Frame commands spaced over the time period of two video frames.
The Q header bit codes if a Quarter Frame command (Q set to 1) or a
finished Full Frame command (Q set to 0) appears most recently in the
session history.

At any moment in time, the session history may hold a sequence of zero
or more complete MTC frame values. A partially complete MTC frame value
(coded by an incomplete sequence of Quarter Frame commands) may also
appear in the session history (after the most recent complete MTC frame
value, if one exists).




Lazzaro/Wawrzynek                                              [Page 49]


INTERNET-DRAFT                                              28 June 2002


If the session history holds a complete MTC frame, and if the Quarter
Frame command or finished Full Frame command that completes this frame
encoding appears in the checkpoint history, Chapter E MUST include the
24-bit COMPLETE field to encode the frame value. The C header bit is set
to 1 to signal the presence of the COMPLETE field.

If a partially complete MTC frame value appears in the session history
(after the most recent complete MTC frame value, if one exists), if this
partially complete frame value not malformed (i.e. the high nibble
sequence of Quarter Frame commands starts at 0 and increments
contiguously to an intermediate value, or else starts at 7 and
decrements contiguously to an intermediate value), and if at least one
Quarter Frame command coding this partial value appears in the
checkpoint history, Chapter E MUST include the 24-bit PARTIAL field to
encode the frame value in progress. The P header bit is set to 1 to
signal the presence of the PARTIAL field.

Note that the PARTIAL field never codes a frame value coded in a Full
Frame command; unfinished Full Frame commands are coded in Chapter X, as
described in Appendix B.5.

The D header flag bit signals the direction the tape is moving.  D is
set to 0 for forward or no movement; D is set to 1 for reverse movement.
If Q is set to 1, the relative motion of the upper nibble of the Quarter
Frame data value determines D. If Q is set to 0, the relative tape
motion from its last position determines D.

The D bit serves two roles in Chapter E. If a PARTIAL field is present
in Chapter E, the D bit serves a syntactic role: its state value is
required to parse the contents of PARTIAL (as explained below). In
addition, the tape direction information coded in the D bit serves an
advisory role for receivers performing tape re-synchronization after a
packet loss episode.

The 3-bit POINT field hold information about the incremental Quarter
Frame encoding in the session history. If Q is set to 1, POINT codes the
upper nibble of the most recent Quarter Frame data value in the session
history. If the PARTIAL field is present in Chapter E, the POINT field
serves a syntactic role: its state value is required to parse the
contents of PARTIAL (as explained below).  If Q is set to 0, POINT is
reserved for future use; senders MUST set POINT to 0x0, and receivers
must ignore its value.









Lazzaro/Wawrzynek                                              [Page 50]


INTERNET-DRAFT                                              28 June 2002


Figure B.4.2 shows the common format for the COMPLETE and PARTIAL
fields.

          0                   1                   2
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         |TYP|  HOURS  |  MINUTES  | SECONDS   | FRAMES  |
         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

           Figure B.4.2 -- COMPLETE and PARTIAL format

The 5-bit HOURS, 6-bit MINUTES, 6-bit SECONDS, and 5-bit FRAMES fields
encode the SMPTE values encoded in Full Frame and Quarter Frame
commands.  The bit allocations are sufficient to encode legal SMPTE
values; note that for some fields, the associated MIDI commands use
larger encodings. The 2-bit TYP field encodes the SMPTE frame type,
using same encoding as the Quarter Frame and Full Frame commands.

If used in the COMPLETE field, the TYP, HOURS, MINUTES, SECONDS, and
FRAMES fields hold the most recent complete frame value, encoded by a
finished Full Frame command or a series of 8 Quarter Frame commands in
the session history. If the COMPLETE field codes data from Quarter Frame
commands, the COMPLETE field value is two frames larger than the
timestamp encoded in the Quarter Frame commands, to compensate for the
transmission delay of the incremental Quarter Frame code.

If used in the PARTIAL field, the TYP, HOURS, MINUTES, SECONDS, and
FRAMES fields do not all contain valid values.  Recall that the PARTIAL
field encodes a partially complete SMPTE value encoded by a series of
Quarter Frame commands in the session history. The bits in the PARTIAL
field that correspond to data values in these Quarter Frame commands
hold valid values; all other PARTIAL bits are set to 0.  The valid
PARTIAL bits directly reflect the data values encoded in the Quarter
Frame commands in the session history; this PARTIAL field encoding MUST
NOT include a compensatory offset for transmission delay.

The D and POINT header values signal the valid bits in the PARTIAL
field.  If D is set to 0, PARTIAL field bits corresponding to Quarter
Frame commands with High Nibble values (0, 1, ... POINT) are valid.  If
D is set to 1, PARTIAL field bits corresponding to Quarter Frame
commands with High Nibble values (7, 6, ... POINT) are valid.










Lazzaro/Wawrzynek                                              [Page 51]


INTERNET-DRAFT                                              28 June 2002


Appendix B.5. System Chapter X: System Exclusive

This Appendix describes Chapter X, the system journal chapter for the
MIDI System Exclusive command (opcode 0xF0, abbreviation SysEx).

The system journal MUST contain at least one Chapter X entry if an
active SysEx command (excluding a finished MTC Full Frame command)
appears in the checkpoint history. A SysEx command is said to "appear"
in the checkpoint history if the history contains a verbatim encoding of
the SysEx command, or if the history contains at least one segment of
the segmental encoding of the SysEx command.

Note that finished MTC Full Frame commands are coded in Chapter E, as
described in Appendix B.4. Unfinished MTC Full Frame commands, however,
are coded in Chapter X. See Appendix A.1 for definitions of finished and
unfinished commands.

The Chapter X encoding is optimized for the short SysEx commands that
signal real-time events. Chapter X is not intended for use with the
longer SysEx commands used in bulk data transport, because the recovery
journal system is very inefficient if the journal size is large.  A MIDI
session that combines real-time and bulk-data functions SHOULD be sent
over two MWPP streams: a bulk-data stream sent over reliable transport,
and a real-time unreliable stream for shorter commands. The midiport SDP
parameter (Appendix C.4) supports split-stream operation.

Note that the structure of the system journal (Figure 9 in Section 5)
permits multiple entries for Chapter X. Each Chapter X entry codes
information about exactly one SysEx command. The relative ordering of
Chapter X entries MUST reflect the relative position of commands in the
checkpoint history: the first Chapter X entry codes the most recent
SysEx command in the history, the second Chapter X entry codes a SysEx
command that appears before the first coded SysEx command in the
history, etc.

A Chapter X entry for a SysEx command encodes all information about the
command that appears in the session history (as distinct from the
checkpoint history, see Appendix A.1 for definitions).  This distinction
is relevant for the coding of SysEx commands whose segments appear
across multiple packets. In this case, the Chapter X entry includes the
starting segments for the SysEx command, even if these segments no
longer appear in the checkpoint history.

Chapter X provides two tools for encoding multiple SysEx commands of the
same type. Each command of a certain type may be encoded in a separate
Chapter X entry (the list tool) or only the most recent command of a
certain type may be encoded (the recency tool).




Lazzaro/Wawrzynek                                              [Page 52]


INTERNET-DRAFT                                              28 June 2002


Each active SysEx command that appears in the checkpoint history MUST be
associated with a Chapter X entry via the list or recency tool
(excluding finished MTC Full Frame commands).  For each SysEx command
type, an implementation may choose either coding tool. Simple
implementations may use the list tool for all command types;
sophisticated implementations may reduce bandwidth by using the recency
tool for some command types.

Figure B.5.1 shows the variable length format for System Chapter X.

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|IDC|L|T| LEN |  DATA ...                                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure B.5.1 -- System Chapter X Format

Chapter X consists of a 1-octet header, following by an arbitrary length
DATA field. The DATA field encodes a modified version of the data octets
of the SysEx command, as described below. The leading 0xF0 and trailing
0x7F SysEx octets never appear in the DATA field.

If the Manufacturer ID value of the SysEx command (coded in the first
octet of the MIDI command) has the values 0x00, 0x7E, or 0x7F, the DATA
field begins with the second data octet of the SysEx command; for all
other Manufacturer ID values, the DATA field begins with the first data
octet of the SysEx command. The 2-bit IDC header field codes 0x00, 0x7E,
and 0x7F ID values, using the method shown in Figure B.5.2.

-----------------------------------------------------------------------
| IDC | Manufacturer ID                | First DATA octet is:         |
|--------------------------------------|------------------------------|
| 0x0 | 0x7E (Universal Real-Time)     | 2nd SysEx data octet         |
|--------------------------------------|------------------------------|
| 0x1 | 0x7F (Universal Non-Real-Time) | 2nd SysEx data octet         |
|--------------------------------------|------------------------------|
| 0x2 | 0x00 (Extension Escape Code)   | 2nd SysEx data octet         |
|--------------------------------------|------------------------------|
| 0x3 | in the range 0x01--0x7D        | 1st SysEx data octet         |
----------------------------------------------------------------------|

                Figure B.5.2 -- IDC Header Field Encoding

The 3-bit LEN header field codes the exact length of short, complete
SysEx commands, and signals alternative coding techniques for longer
commands and truncated commands.




Lazzaro/Wawrzynek                                              [Page 53]


INTERNET-DRAFT                                              28 June 2002


The LEN values 0x0 through 0x5 indicate that the length of the DATA
field is 1-6 octets. For these LEN values, the DATA field encodes a
complete SysEx command, as a verbatim copy of the SysEx data octets
(possibly skipping the first octet, as detailed in Figure B.5.2).

The LEN value 0x6 indicates that the DATA field contains 7 or more
octets. The DATA field encodes a complete SysEx command, as a verbatim
copy of the data octets of the SysEx command (possibly skipping the
first octet, as detailed in Figure B.5.2), with one exception: bit 7
(the most-significant bit) of the final data octet is set to one. This
set bit implicitly codes the length of the DATA field (MIDI data octets,
by definition, clear bit 7).

The LEN value 0x7 indicates that the DATA field encodes a truncated
SysEx command. This coding option is only to be used for SysEx commands
encoded using the segmented method, for the case where not all segments
appear in the session history.

If LEN is 0x7, the DATA field encodes the data octets of the SysEx
command segments that appear in the session history. The DATA field
holds a verbatim copy of the data octets of the coded portion of the
SysEx command, with two exceptions: the first octet may be skipped (as
detailed in Figure B.5.2) and bit 7 (the most-significant bit) of the
final coded data octet is set to one (to provide an implicit field
length, as in the case where LEN is 0x6).

The L and T header flags describe the coding tool used for the Chapter X
bitfield. If L is set to 1 (the list tool), all SysEx commands of this
type have an associated Chapter X bitfield in the system journal.  If L
is set to 0 (the recency tool), only the most recent SysEx command of
this type has an associated Chapter X bitfield in the system journal.

The T flag defines the meaning of the word "type" in the previous
paragraph. The T flag has different semantics for MIDI Universal SysEx
commands (Manufacturers ID 0x7E and 0x7F) and for generic SysEx commands
(all other Manufacturers ID values).

We first define the T flag for Universal SysEx commands. The first four
data octets of Universal commands have a defined semantics in the MIDI
standard; we symbolically represent these four octets as: ID cc SubID
SubID1. If T is set to 0, all Universal commands with the same ID, cc,
SubID, and SubID1 values are considered the same type. If T is set to 1,
all Universal commands with the same ID, cc, and SubID values are
considered the same type.

For generic SysEx commands (all Manufacturers ID values except 0x7E and
0x7F), we define the T flag as follow. The first data octet of a generic
SysEx command is the Manufacturers ID; the remaining data octets may



Lazzaro/Wawrzynek                                              [Page 54]


INTERNET-DRAFT                                              28 June 2002


have an arbitrary organization, but often have a set of octets coding
device and sub-command, followed by data octets for the command.

If T is set to 0, all generic SysEx commands with the same ID value are
considered to be of the same type. If T is set to 1, the SysEx command
is assumed to have a device/sub-command/data organization, and all
generic SysEx commands with the same ID value, device, and sub-command
values are considered to be of the same type. If the SysEx command has a
multi-level sub-command structure, these semantics require identical
sub-command values at all levels.









































Lazzaro/Wawrzynek                                              [Page 55]


INTERNET-DRAFT                                              28 June 2002


Appendix C. MWPP Session Description Protocol (SDP) Definitions


Appendix C.1. SDP Definitions: Recovery Journal

In this Appendix, we define session description parameters that affect
the recovery journal.

C.1.1. The recj Parameter

By default, MWPP streams that use unreliable transport (such as UDP)
MUST contain a recovery journal in each packet, and MWPP streams that
use reliable transport (such as TCP) MUST NOT contain a recovery journal
in each packet.

In some applications, this behavior is not optimal. For example, it is
possible to write percussive musical instrument models in Structured
Audio that are inherently robust to lost MIDI data. If an MWPP
mpeg4-generic UDP stream drives these models, the recovery journal
section is not needed.

To override the default, the MWPP-specific SDP parameter recj may be
used to code the presence (1) or absence (0) of the recovery journal
section in MWPP packets. For example, this stream description configures
a UDP stream that does not use the recovery journal:

m=audio 5004 RTP/AVP 96
c=IN IP4 169.229.60.64
a=rtpmap: 96 mwpp/44100
a=fmpt: 96 recj=0;




C.1.2. The chmay, chnever, and chmust Parameters

By default, a chapter appears in the recovery journal if the normative
text for the chapter in Appendices A.1-8 or B.1-5 demands it.  These
appendices use the MUST keyword to specify the conditions under which a
chapter must appear in the recovery journal.

The MWPP-specific SDP parameters chmay, chnever, and chmust act to
change the inclusion conditions for chapters. The chmay parameter
changes the MUST keyword conditional for chapter inclusion into a MAY.
The chnever parameter specifies chapter types that MUST NOT appear in
the recovery journal. The chmust parameter reaffirms the default MUST
keyword for a chapter; this parameter simplifies the SDP for complex
recovery journal configurations.



Lazzaro/Wawrzynek                                              [Page 56]


INTERNET-DRAFT                                              28 June 2002


These chmay, chnever, and chmust parameters use the following syntax:

  <parameter> = [optional comma-separated channel list,][chapter list];

The channel list specifies the channel journals for which this parameter
applies; if no channel list is provided, the parameter applies to all
channel journals.  The chapter list specifies the channel and system
chapters for which this parameter applies, using a concatenated list of
one or more upper-case letters corresponding to the chapter types. The
channel list is irrelevant for system chapters.  Multiple assignments to
these parameters have a cumulative effect, and are applied in the order
of parameter appearance.

For example, the following stream configuration includes a fmpt line
that removes protection for poly and channel aftertouch commands on all
channels, weakens note command protection for channels 14 and 15, and
removes pitch wheel protection for all channels except channel 12:

m=audio 5004 RTP/AVP 96
c=IN IP4 169.229.60.64
a=rtpmap: 96 mwpp/44100
a=fmpt: 96 chnever=WTA;chmay=14,15,N;chmust=12,W;

The chnever, chmay, and chmust parameters are targeted to efficiency-
conscious applications, that might need to restrict resiliency coverage
to a few channels or a few chapter types, to conserve bandwidth or
computation.


Appendix C.2. SDP Definitions: Command Execution Semantics

As defined in Section 3, the MIDI command section of the MWPP payload
consists of a list of MIDI commands, each with an associated command
timestamp. By default, a command timestamp indicates the execution time
for the command. If two commands have identical timestamps, the commands
execute simultaneously.

This default timestamp behavior is not a good fit for the MIDI wire
protocol [1]. The MIDI wire protocol, a networking standard for the
remote control of musical instruments over serial lines, does not send
timestamps over the wire. Instead, MIDI commands are placed on the wire
at the moment of occurrence, and receivers infer the timestamp from the
moment of reception. In this memo, we refer to this coding technique as
an "implicit" or a "time-of-arrival" code.

As these names suggest, it is not possible to code two simultaneous MIDI
commands over the MIDI wire protocol, because two commands can not be
simultaneously sent over a serial line. If two musical events occur at



Lazzaro/Wawrzynek                                              [Page 57]


INTERNET-DRAFT                                              28 June 2002


the same moment in time, a wire protocol sender arbitrarily sends one
MIDI command first, followed by the second MIDI command. The wire
protocol receiver sees a sequence of MIDI commands offset in time, but
cannot tell if the MIDI command offsets are serialization artifacts or
genuine event timing offsets played by the musician.

This Appendix defines alternative semantics of MIDI command timestamps,
for use in transcoding time-of-arrival MIDI data streams into MWPP
packets. The optional SDP parameter tsmode codes the choice of timestamp
semantics. The tsmode parameter takes on one of three symbolic values:
comex, async, or buffer.

The comex value indicates the default "command execution timestamp"
semantics defined in Section 3. The async and buffer values code two
different methods for coding MIDI wire protocol data, which we describe
in sub-sections C.2.1 and C.2.2 below.

The async and buffer methods are based on a simple idea: each method
describes a sampling algorithm to sense data octets on a MIDI wire. The
async and buffer methods use several SDP parameters to describe the
physical properties of the sampling algorithm, in order to describe a
wide range of plausible hardware and operating system environments.

One such SDP parameter is linerate. The linerate parameter codes the
timespan of one octet on the serial line. The linerate parameter has
units of nanoseconds, and takes on integral values. For the MIDI wire
protocol as defined in [1], linerate is 320,000 nanoseconds. Implicit
MIDI data sent over other physical layers (such as IEEE-1394) might
require a different linerate value. If linerate is not specified, it is
considered to be undefined.

We now describe the async and buffer methods in detail.

C.2.1 Description of the async method

The async method assumes an asynchronous sampling of the MIDI serial
line. At the moment a complete octet is received, it is labelled with an
accurate wall-clock time value, whose units match the units of the RTP
header timestamp field.

The MWPP-specific SDP parameter octpos defines how MWPP command
timestamps are derived from these octet timestamps. If octpos has the
symbolic value first, a MIDI command timestamp codes the time value for
the first octet of the MIDI command. If octpos has the symbolic value
last, a MIDI command timestamp codes the time value for the last octet
of the MIDI command.  If an octpos parameter does not appear in the
session description, the MIDI command timestamp value may reflect any
octet of the MIDI command.



Lazzaro/Wawrzynek                                              [Page 58]


INTERNET-DRAFT                                              28 June 2002


Note that the octpos value refers to the first or last octet of the MIDI
command as it appears on the MIDI wire, not the MIDI command as it
appears in the MWPP packet. This distinction is important for cases
where the MWPP command representation includes extra octets that do not
appear on the MIDI wire. For example, if a MIDI command appears on the
wire using running status coding, and this command becomes the first
command in the MIDI command section of an MWPP packet, the MWPP
representation begins with a status octet that did not appear in the
original MIDI source on the wire.

In the case of segmented SysEx commands (see Section 3), the octpos
rules apply to the octets of the SysEx command segment as they appear on
the MIDI wire.

We now show a session description example for the async method.
Consider an MWPP sender that is transcoding a MIDI wire protocol command
stream into an MWPP UDP RTP stream. The sender runs on a computing
platform that time stamps every incoming octet on the MIDI cable serial
line, and the sender chooses to use the timestamp of the first octet of
each command as the MIDI command timestamp. This stream description
accurately describes the transcoding:

m=audio 5004 RTP/AVP 96
c=IN IP4 169.229.60.64
a=rtpmap: 96 mwpp/44100
a=fmpt: 96 tsmode=async;linerate=320000;octpos=first;

C.2.2 Description of the buffer method

The buffer method uses a synchronous sampling of the MIDI wire data. In
this model, each arriving octet on the MIDI wire is placed in a buffer,
without adding a timestamp.

At periodic intervals, the MWPP sender examines the buffer. The sender
removes complete MIDI commands from the buffer, and places those
commands into the MIDI command section of an MWPP packet. The command
timestamp reflects the actual moment of buffer examination, expressed in
the units of the RTP timestamp field. Note that in this coding scheme,
several commands may have the same command timestamp.

The MWPP-specific SDP parameter mperiod defines the nominal periodic
sampling interval for the buffer tsmode. The mperiod parameter takes on
positive integral values, and has units of the RTP timestamp field.

The MWPP-specific SDP parameter octpos (described in C.2.1 for the async
method) is also defined for the buffer method, but takes on different
semantics. These semantics address the choice of the command timestamp
for MIDI commands whose octets appear on the MIDI wire across several



Lazzaro/Wawrzynek                                              [Page 59]


INTERNET-DRAFT                                              28 June 2002


sampling periods.

If octpos takes on the symbolic value first, the command timestamp
reflects the arrival period of the first octet of the command on the
wire. If octpos takes on the symbolic value last, the command timestamp
reflects the arrival period of the last octet of the command on the
wire.

If an octpos parameter does not appear in the session description, MIDI
commands whose octets appear across several sampling periods may take on
the timestamp value associated with any arrival period of an octet in
the command. In the case of segmented SysEx commands (see Section 3),
the octpos rules apply to the octets of the SysEx command segment as
they appear on the MIDI wire.

We now show a session description example for the buffer method.
Consider an MWPP sender that is transcoding a MIDI wire protocol command
stream into an MWPP UDP RTP stream.  The sender runs on a computing
platform that places MIDI serial line data into a buffer upon receipt,
without timestamps.

The sender polls the buffer 1000 times a second, extracts all complete
commands from the buffer, and places them in the MIDI command section of
an MWPP packet. All of the MIDI command timestamps in this packet are
identical, and reflect the actual clock value at the sampling instant,
in RTP timestamp units. This stream description accurately describes the
transcoding:

m=audio 5004 RTP/AVP 96
c=IN IP4 169.229.60.64
a=rtpmap: 96 mwpp/44100
a=fmpt: 96 tsmode=buffer;linerate=320000;octpos=last;mperiod=44;

Note that mperiod takes on an integral value, and has the units of the
RTP timestamp field. In this example, the mperiod value of 44 is derived
by dividing the rtpmap srate (44100 Hz) by the 1000 Hz buffer sampling
rate, and rounding to the nearest integer.  The MIDI command timestamps
might not advance by exact multiples of 44, as the actual buffer
sampling period might not precisely match the nominal sampling period.

Appendix C.3. SDP Definitions: Media Time

In Section 2.1.3, we define the media time of an MWPP RTP packet as the
RTP timestamp difference (modulo 2^32) between the packet's successor
and the packet itself.

By default, the media time for a packet may be arbitrarily long.  For
example, consider an MWPP stream that codes the real-time behavior of a



Lazzaro/Wawrzynek                                              [Page 60]


INTERNET-DRAFT                                              28 June 2002


musician playing a piano keyboard. If the musician does not play a note
for several seconds, there is no reason to send a new packet, and so the
media time of the last packet sent may grow without bound.

However, for some applications, it is desirable to set a maximum media
time for an MWPP packet, that is independent of the source rate of MIDI
event data. This constraint acts to set a minimum packet sending rate,
which may simplify algorithms performing clock-skew compensation,
network latency estimation, and packet loss recovery.

Applications may use the SDP maxptime (defined in [9]) for this purpose.
The maxptime parameter specifies the maximum amount of media time an
MWPP packet encodes, in units of milliseconds. For example, the
following session description sets a maximum media time of 0.5 seconds,
and thus a minimum packet rate of 2 Hz:

m=audio 5004 RTP/AVP 96
c=IN IP4 169.229.60.64
a=rtpmap: 96 mwpp/44100
a=fmpt: 96 maxptime=500;

Appendix C.4. SDP Definitions: Multiple Streams

Several MWPP streams may appear in a session description. By default,
each MWPP stream is an independent entity.  The MIDI name space (16 MIDI
Channels + MIDI Systems) for each MWPP stream is unique, and the
rendering for each MWPP stream proceeds independently. The audio outputs
of the streams are presented simultaneously, using the standard
synchronization and audio mixing conventions for RTP.

In this Appendix, we define two MWPP-specific SDP parameters for use in
sessions with several MWPP streams. These parameters (midiport and
zerosync) add three features to MWPP:

  1. Several MWPP streams may target the same MIDI name space.

  2. Several MWPP streams may be bundled to form a larger MIDI
     name space, that a single rendering system may treat as
     an ordered entity.

  3. Receivers may be informed of the synchronized behavior of the
     RTP timestamp fields of several MWPP streams, to simplify the
     time-locked rendering of multi-stream MWPP systems.

In Sections C.4.1 and C.4.2, we normatively define the midiport and
zerosync parameters. In Section C.4.3, we show a series of examples,
that illustrate the feature set described above.




Lazzaro/Wawrzynek                                              [Page 61]


INTERNET-DRAFT                                              28 June 2002


C.4.1 The midiport parameter

The midiport SDP parameter codes an arbitrary identification number for
the MIDI name space (16 MIDI channels + MIDI Systems) of an MWPP stream.
The midiport parameter may take on integer values between 0 and
429496729.

If several MWPP streams in a session share the same midiport value, the
streams target the same MIDI name space. We refer to this relationship
as the identity relationship.

If several MWPP streams in a session have contiguous midiport values
(i.e. i, i+1, ... i+k), the name spaces of the MWPP streams form an
ordered entity. In this case, the streams in the entity are said to
share an ordered relationship.

Note that streams may participate in both an identity and an ordered
relationship, if MWPP in an identity relationship have a midiport value
that forms part of an ordered relationship. If the midiport values of
two MWPP streams are not part of an ordered or identity relationship,
the two streams are independent, and have independent MIDI name spaces.

MWP streams in an ordered or identity relationship MUST all have the
same media type (mwpp or mpeg4-generic).

For the mpeg4-generic media type, all MWPP streams in an ordered or
identity relationship render using the same instance of the synthesis
engine, and thus the following restrictions apply:

  1. All streams in an identity or ordered relationship must have
     the same profile-level-id (74 for Main Synthetic, 75 for
     Wavetable Synthesis, 76 for General MIDI).

  2. Ordered relationships MUST NOT be used with Wavetable Synthesis
     or General MIDI object types, because these systems are only
     defined for 16 MIDI voice channels. Ordered relationships MAY
     be used with the Main Synthetic object type, and follow the
     MIDI semantics defined in 5.14.3.2.2. of [5].

  3. At most one of the streams in an identity or ordered
     relationship may have a config parameter value other than
     the empty string. In this case, the non-empty config value
     configures the stream. Alternatively, the config parameter
     for all streams may be set to the empty string. In this case,
     exactly one stream in the relationship MUST define the
     configuration using the tools described in Section C.5.





Lazzaro/Wawrzynek                                              [Page 62]


INTERNET-DRAFT                                              28 June 2002


For MWPP streams in an ordered or identity relationship that use the
mwpp media type, at most one stream may specify a MIDI renderer (using
the tools described in C.5). Each MIDI rendering type may define its own
semantics with regard to identity and ordered relationships.

C.4.2 The zerosync parameter

The RTP timestamp value of the first packet in a stream is not set to
zero. Instead, the RTP standard [2] mandates that the RTP timestamp is
initialized to a randomly chosen value, to guard against plaintext
attacks on encrypted streams. As a consequence, a receiver cannot
directly use RTP timestamps to play back two RTP streams in sync, even
if the sender is generating synchronized timestamps for the streams.

Note that the Real Time Control Protocol (RTCP), a low-bandwidth
feedback channel that is paired with each RTP stream, includes a
synchronization feature. Certain types of RTCP packets code the current
time in two forms: the format of the RTP timestamp, and the 64-bit
Network Time Protocol (NTP) format.  A receiver may examine the NTP
timestamps of several RTCP streams, and use this information to compute
the ongoing temporal relationship between the RTP streams associated
with the RTCP streams.

For many MWPP applications, this RTCP-based method is a good way to
synchronize streams. In some applications, however, this method is not
optimal, because of the synchronization time delay at the start of the
session.

The MWPP-specific SDP parameter zerosync provides an alternative
mechanism for MWPP stream synchronization. The zerosync parameter codes
the RTP timestamp offsets for each stream, so that streams that are
generated in a synchronized fashion may be played back in sync without
using RTCP feedback. The use of the zerosync parameter weakens the
security of RTP, as discussed in Section 7 of this memo.

The zerosync parameter supports two different ways to normalize RTP
timestamp fields. One mechanism is in effect if the zerosync parameter
takes on integer values in the range 1 to 429496729. A second mechanism
is in effect of the zerosync parameter takes on the special value 0.

We first describe the synchronization behavior for non-zero values of
zerosync. This synchronization mechanism is designed for use with a set
of MWPP streams that form an ordered or identity relationship.  For a
relationship to use this mechanism, all streams in the relationship MUST
include a zerosync parameter set to a non-zero value, and the srate
rtpmap parameter (see Section 6.1) of all streams in the relationship
MUST have the same value.




Lazzaro/Wawrzynek                                              [Page 63]


INTERNET-DRAFT                                              28 June 2002


Given these conditions, the normalized RTP timestamp for a packet in a
stream is computed by subtracting (modulo 2^32) the stream zerosync
parameter value from the original RTP timestamp of the packet.

Next, we describe the synchronization behavior for zero-valued zerosync
parameters. All streams in a session with zerosync = 0 are generated
from a single RTP timebase. In other words, these streams simply ignore
the RTP requirement for random timestamp offsets.  All streams whose
zerosync values are set to 0 MUST have the same srate rtpmap parameter
value.

Note that a stream description may contain, at most, one zerosync
parameter assignment. A stream may participate in a non-zero-valued
zerosync behavior or a zero-valued zerosync behavior, but not both.

C.4.3 Multi-stream examples using midiport and zerosync.

This section shows several session description examples that use the
midiport and zerosync parameters.

Our first example shows two mpeg4-generic MWPP streams that drive the
same General MIDI decoder.

m=audio 5004 RTP/AVP 61
c=IN IP4 169.229.60.64
a=rtpmap: 61 mpeg4-generic/44100
a=fmpt: 61 streamtype=5; mode=mwpp; config="e4"; profile-level-id=76;
a=fmpt: 61 midiport=12;zerosync=1726
m=audio 5006 RTP/AVP 62
c=IN IP4 169.229.60.64
a=rtpmap: 62 mpeg4-generic/44100
a=fmpt: 62 streamtype=5; mode=mwpp; config=""; profile-level-id=76;
a=fmpt: 62 midiport=12;zerosync=726

The two UDP streams in the session use different UDP ports (5004/5006)
that map to different RTP header PTYPE values (61 and 62). The profile-
level-id codes General MIDI. Note that only one config parameter is set
to a non-empty string. The midiport values indicate the streams share an
identity relationship; the presence of zerosync parameters with non-zero
values establish the synchronization mechanism.

A variant on this example, whose session description is not shown, is to
have two streams in an identity relationship driving the same MIDI
renderer, each with a different transport type. One stream would use
UDP, and would be dedicated to real-time messages. A second stream would
use TCP, and would be dedicated to sending reliable bulk SysEx dumps.





Lazzaro/Wawrzynek                                              [Page 64]


INTERNET-DRAFT                                              28 June 2002


In the next example, two mpeg4-generic MWPP streams form an ordered
relationship to drive a Structured Audio decoder with 32 MIDI voice
channels.

m=audio 5004 RTP/AVP 61
c=IN IP4 169.229.60.64
a=rtpmap: 61 mpeg4-generic/44100
a=fmpt: 61 streamtype=5; mode=mwpp; config=""; profile-level-id=74;
a=fmpt: 61 midiport=5;zerosync=0;
m=audio 5006 RTP/AVP 62
c=IN IP4 169.229.60.64
a=rtpmap: 62 mpeg4-generic/44100
a=fmpt: 62 streamtype=5; mode=mwpp; config=""; profile-level-id=74;
a=fmpt: 62 midiport=6;zerosync=0;

The sequential midiport pattern for the two streams establishes the
ordered relationship; the profile-level-id values of 74 indicate Main
Synthetic (i.e. Structured Audio). The midiport=5 stream maps to
Structured Audio extended channels range 0-15, the midiport=6 stream
maps to Structured Audio extended channels range 16-31. Both config
strings are empty; the Structured Audio decoder is configured by MWPP-
specific SDP parameters that are not shown above. Note the use of the
zero-valued zerosync option.


Appendix C.5. SDP Definitions: MIDI Rendering

A MIDI command stream codes a series of high-level events, such as the
onset and termination of musical notes. A receiver turns this event
stream into audio (or some applications, into control actions such as
the dimming of stage lights) by applying a MIDI rendering algorithm.

By default, MWPP over RTP streams do not specify a rendering algorithm.
This default behavior assumes that the rendering algorithm is sent in-
band, via MIDI System Exclusive commands. The minimal mwpp stream
description in Section 6.1 exhibits this default behavior.

In contrast, the default rendering algorithm for mpeg4-generic streams
is the MPEG 4 synthesis algorithm coded in the SDP config parameter. The
minimal mpeg4-generic stream description in Section 6.2 exhibits this
default behavior.

In this Appendix, we define the SDP parameter "render" to override these
default rendering methods. Uses of the render parameter must obey the
restrictions defined in Appendix C.4.1.

This document defines two symbolic values for render: "default" and
"sasc". However, the render parameter is extensible. Ancillary IETF



Lazzaro/Wawrzynek                                              [Page 65]


INTERNET-DRAFT                                              28 June 2002


documents may define other values for the render parameter. Receivers
MUST NOT participate in sessions if the session description sets the SDP
render parameter to a value that is not known by the receiver.

If the SDP parameter render takes on the value "default", the stream
uses the default rendering method, as defined in Section 6.1 (for media
type mwpp) or Section 6.2 (for media type mpeg4-generic).

We describe the use of the sasc value for the render parameter in the
following sub-section.

C.5.1 The sasc Method

The sasc method supports the flexible transport of the MPEG 4 Audio
AudioSpecificConfig() binary data block. This structure may contain the
configuration data for the General MIDI [1], DLS2 [18], or Structured
Audio [5] synthesis methods, as specified in [5].

Only an mpeg4-generic stream description may use the sasc method.  To
signal the use of sasc, the config parameter for the mpeg4-generic
stream MUST be set to the empty string, AND the SDP render parameter
MUST be set to the symbolic value sasc.

Two AudioSpecificConfig() transport parameters are defined by sasc
method:

  o  The SDP parameter url may be assigned a string that contains
     a Uniform Resource Locator (URL) to the AudioSpecificConfig()
     data.

  o  The SDP parameter inline may be assigned a string that contains
     a Base64 encoding of a representation of AudioSpecificConfig().

Exactly one url parameter assignment or exactly one inline parameter
assignment MUST appear in a stream description that uses the sasc
method. The url and inline parameters MUST NOT both appear in the same
stream description.

The sasc method is based on MIME [17]. We consider sasc to be a MIME
subtype for the audio media type. The SDP parameters we define in the
remainder of this sub-section may also act as MIME parameters for the
audio/sasc MIME type. If the url parameter is used in a stream
description, the coded URL SHOULD that returns a MIME document of type
audio/sasc.







Lazzaro/Wawrzynek                                              [Page 66]


INTERNET-DRAFT                                              28 June 2002


We define the following SDP/MIME parameters for use with the sasc
method:

  o compr. The compr parameter indicates which lossless compression
    algorithm is in use to reduce the size of AudioSpecificConfig().
    Compression occurs before any content transfer encoding (such as
    the Base64 encoding for the inline parameter).

    This memo defines two legal values for compr: none (for no
    compression) and gzip (for the gzip compression algorithm as
    defined in [19]). The default value for compr is gzip.

    The compr parameter is an extensible parameter; other IETF
    documents may define new compression methods. Receivers MUST
    NOT participate in a session if the session description sets
    the compr parameter to a value that is not known by the receiver.

  o cid. The cid parameter is assigned a string value that
    encodes a globally unique identifier for the content encoded
    in the AudioSpecificConfig().

    The cid value supports cache management: if a receiver notices
    it has previously used an AudioSpecificConfig(), it can avoid
    redundant transmission or decoding.

    If an AudioSpecificConfig() is coded in a MIME document, the
    Content-ID header [17] value MUST match the cid value in the
    stream description. Using the cid parameter in a MIME document
    is legal but redundant, because Content-ID also codes the string.

If these parameters are in use for a stream, SDP fmpt lines that assign
values to these parameters MUST appear in the session description. In
addition, if the stream description uses the url parameter to encode a
MIME document, the MIME version of these parameters SHOULD appear in the
MIME document, unless the parameter definition indicates otherwise.
















Lazzaro/Wawrzynek                                              [Page 67]


INTERNET-DRAFT                                              28 June 2002


We now show stream description examples for the sasc method. The stream
description below uses the inline SDP parameter to code the
AudioSpecificConfig() block for a mpeg4-generic General MIDI stream.
This stream has the same characteristics as the example shown in Section
6.2.

m=audio 5004 RTP/AVP 61
c=IN IP4 169.229.60.64
a=rtpmap: 61 mpeg4-generic/44100
a=fmpt: 61 streamtype=5; mode=mwpp; config=""; profile-level-id=76;
a=fmpt: 61 render=sasc; inline="e4"; compr=none;

Note that the empty value of config signals the use of MWPP-specific
decoder configuration. We use a General MIDI stream in this example for
didactic purposes; in practice, the sasc method would not be used for a
General MIDI stream, because the configuration string is trivially
short.

The stream description below uses the url SDP parameter to code the
AudioSpecificConfig() block for the same General MIDI stream:

m=audio 5004 RTP/AVP 61
c=IN IP4 169.229.60.64
a=rtpmap: 61 mpeg4-generic/44100
a=fmpt: 61 streamtype=5; mode=mwpp; config=""; profile-level-id=76;
a=fmpt: 61 render=sasc; url="http://www.berkeley.edu/oski.sasc";
a=fmpt: 61 cid="xjflsoeiurvpa09itnvlduihgnvet98pa3w9utnuighbuk";

In this example, the MIME-encoded document oski.sasc, of MIME type
audio/sasc, contains the AudioSpecificConfig(). The default gzip
compression is used on the AudioSpecificConfig(), and the cid value
matches the Content-ID value of oski.sasc.

Appendix D. Author Addresses

John Lazzaro (corresponding author)
UC Berkeley
CS Division
315 Soda Hall
Berkeley CA 94720-1776
Email: lazzaro@cs.berkeley.edu

John Wawrzynek
UC Berkeley
CS Division
631 Soda Hall
Berkeley CA 94720-1776
Email: johnw@cs.berkeley.edu



Lazzaro/Wawrzynek                                              [Page 68]


INTERNET-DRAFT                                              28 June 2002


Appendix E. References

[1] MIDI Manufacturers Association. The complete MIDI 1.0
detailed specification, 1996. http://www.midi.org

[2] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson.
RTP: A transport protocol for real-time applications. Work
in progress, draft-ietf-avt-rtp-new-11.txt.

[3] H. Schulzrinne and S. Casner. RTP Profile for Audio and Video
Conferences with Minimal Control. Work in progress,
draft-ietf-avt-profile-new-12.txt.

[4] Internet Engineering Task Force. Transport of MPEG-4 Elementary
Streams.  Work in progress, draft-ietf-avt-mpeg4-simple-02.txt.

[5] International Standards Organization. ISO 14496 MPEG-4,
Part 3 (Audio) Subpart 5 (Structured Audio) 1999.

[6] John Lazzaro and John Wawrzynek. A Case for Network
Musical Performance. The 11th International Workshop on Network
and Operating Systems Support for Digital Audio and Video
(NOSSDAV 2001) June 25-26, 2001, Port Jefferson, New York.
http://www.cs.berkeley.edu/~lazzaro/sa/pubs/pdf/nossdav01.pdf

[7] Sfront source code release, includes a Linux networking
client that implements the MIDI RTP packetization.
http://www.cs.berkeley.edu/~lazzaro/sa/

[8] Dominique Fober, Yann Orlarey, Stephane Letz. Real Time Musical
Events Streaming over Internet. Proceedings of the International
Conference on WEB Delivering of Music 2001, pages 147-154
http://www.grame.fr/~fober/RTESP-Wedel.pdf

[9] M. Handley, V. Jacobson and C. Perkins. SDP: Session Description
Protocol. Work in progress, draft-ietf-mmusic-sdp-new-10.txt.

[10] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston,
J. Peterson, R. Sparks, M. Handley, and E. Schooler. SIP: Session
Initiation Protocol. Work in progress,
draft-ietf-sip-rfc2543bis-09.txt.

[11] J. Rosenberg and H. Schulzrinne. An Offer/Answer Model with
SDP. Work in progress, draft-ietf-mmusic-sdp-offer-answer-02.txt.

[12] H. Schulzrinne, A. Rao, and R. Lanphier. Real Time Streaming
Protocol (RTSP). Work in progress,
draft-ietf-mmusic-rfc2326bis-00.txt.



Lazzaro/Wawrzynek                                              [Page 69]


INTERNET-DRAFT                                              28 June 2002


[13] D. D. Clark and D. L. Tennenhouse, "Architectural considerations
for a new generation of protocols," in SIGCOMM Symposium on
Communications Architectures and Protocols , (Philadelphia,
Pennsylvania), pp. 200--208, IEEE, Sept. 1990.  Computer
Communications Review, Vol. 20(4), Sept. 1990.

[14] C. Bormann et al. RFC 3095: RObust Header Compression (ROHC).
Internet Engineering Task Force, July 2001. Also see related work at
http://www.ietf.org/html.charters/rohc-charter.html.

[15] D. Yon. Connection-Oriented Media Transport in SDP.  Work in
progress, draft-ietf-mmusic-sdp-comedia-03.txt.

[16] International Standards Organization. ISO 14496 MPEG-4, Part 3
(Audio) Subpart 1 (Main Document) 1999.

[17] N. Freed and N. Borenstein. MIME Part 1: Format of Internet
Message Bodies. RFC 2045, November 1996.

[18] MIDI Manufacturers Association. The MIDI Downloadable Sounds
Specification, v98.2. Available for purchase at http://www.midi.org.

[19] P. Deutsch. GZIP file format specification version 4.3. RFC 1952,
May 1996.



Appendix F. Expiration Notice

This document expires December 28, 2002.





















Lazzaro/Wawrzynek                                              [Page 70]