Network Working Group                                        S. Pfeiffer
Internet-Draft                                                 C. Parker
Expires: September 20, 2005                                      A. Pang
                                                                   CSIRO
                                                          March 19, 2005


  The Annodex exchange format for time-continuous bitstreams, Version
                                  3.0
                       draft-pfeiffer-annodex-02

Status of this Memo

   This document is an Internet-Draft and is subject to all provisions
   of Section 3 of RFC 3667.  By submitting this Internet-Draft, each
   author represents that any applicable patent or other IPR claims of
   which he or she is aware have been or will be disclosed, and any of
   which he or she become aware will be disclosed, in accordance with
   RFC 3668.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on September 20, 2005.

Copyright Notice

   Copyright (C) The Internet Society (2005).

Abstract

   This specification defines "Annodex", an exchange format for
   annotated and indexed time-continuous bitstreams.  Annodex provides a
   bitstream format for exchanging multitrack interleaved
   time-continuous bitstreams and textual meta information attached to



Pfeiffer, et al.       Expires September 20, 2005               [Page 1]


Internet-Draft                   ANNODEX                      March 2005


   temporal fragments of the binary bitstreams.  The meta information is
   given in the Continuous Media Markup Language (CMML).  Annodex
   enables integration of time-continuous bitstreams into the browsing
   and searching functionality of the World Wide Web.

   The specification is not encumbered by patents.  The Annodex format
   is protected by a trade mark to prevent the use of the term "Annodex"
   for any related but non-conformant and therefore non-interoperable
   technology.  Conformant technology is encouraged to use the term
   "Annodex" when referring to the exchange format.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1   Motivation . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.2   Overview . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Features of Annodex  . . . . . . . . . . . . . . . . . . . . .  5
   3.  Authoring exchange format  . . . . . . . . . . . . . . . . . .  8
   4.  The Ogg skeleton logical bitstream . . . . . . . . . . . . . . 10
     4.1   The format of the skeleton ident header  . . . . . . . . . 11
     4.2   The format of the skeleton secondary headers . . . . . . . 13
     4.3   Media mapping of skeleton into Ogg . . . . . . . . . . . . 16
   5.  Handling time in Annodex format bitstream  . . . . . . . . . . 18
     5.1   Conceptual overview  . . . . . . . . . . . . . . . . . . . 18
     5.2   Mapping a granule position to a time position  . . . . . . 19
     5.3   Addressing/seeking into the bitstream  . . . . . . . . . . 22
     5.4   Remultiplexing a bitstream . . . . . . . . . . . . . . . . 23
   6.  MIME media type applications . . . . . . . . . . . . . . . . . 24
     6.1   MIME media type registration for 'application/annodex' . . 24
       6.1.1   URI addressing into Annodex bitstreams . . . . . . . . 24
       6.1.2   HTTP 'Accept' header field interpretation  . . . . . . 25
     6.2   MIME media type registration for 'video/annodex' . . . . . 26
     6.3   MIME media type registration for 'audio/annodex' . . . . . 26
   7.  Security considerations  . . . . . . . . . . . . . . . . . . . 28
   8.  ChangeLog  . . . . . . . . . . . . . . . . . . . . . . . . . . 29
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 29
       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 30
   A.  Definitions of terms and abbreviations . . . . . . . . . . . . 32
   B.  Glossary of acronyms . . . . . . . . . . . . . . . . . . . . . 33
   C.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 34
       Intellectual Property and Copyright Statements . . . . . . . . 35






Pfeiffer, et al.       Expires September 20, 2005               [Page 2]


Internet-Draft                   ANNODEX                      March 2005


1.  Introduction

1.1  Motivation

   When searching the World Wide Web, time-continuous data such as audio
   and video files are currently treated as "dark matter" outside the
   existing infrastructure of the World Wide Web: It is not possible to
   look inside such files, search for their content through common
   text-based search engines, or directly hyperlink to points of
   interest inside them.  The file can generally only be consumed in its
   entirety.  In addition, such files are "dead ends" in that by
   consuming their content the hyperlinking functionality of the Web is
   left behind.

   Text documents were enabled for the Web through definition of a
   markup language (HTML [1]) for text documents to enable description
   of the structure of a document, and thus allow for the separation of
   content from presentation.  This specification takes the same
   approach for time-continuous documents.  The markup language for
   time-continuous documents is called CMML, short for Continuous Media
   Markup Language [2].  It describes the structure of time-continuous
   documents and allows for a clean separation of content from
   presentation.

   To turn text documents into a Web resource that can be exchanged
   between different applications, HTML markup is added.  Such an
   exchange format where CMML is merged with the time-continuous
   document(s) it describes is also necessary to turn the
   time-continuous document(s) into a Web resource and provide a
   standard exchange format between applications.  This format is called
   "Annodex" for annotated and indexed documents and is defined here.

1.2  Overview

   Annodex is using a container format that allows transport and storage
   of interleaved time-synchronous bitstreams.  In a clean layering
   approach as is familiar from Internet protocols the functionality of
   the container format and CMML is explicitly separated.  Each layer
   solves a specific problem without being dependent on layers that are
   further up in functionality.  The container format of Annodex is the
   Ogg encapsulation format version 0 [3].  Annodex is an Ogg bitstream
   containing a "skeleton" and a CMML logical bitstream, in addition to
   other temporally interleaved data bitstreams.  Ogg skeleton is a
   logical bitstream that describes all the other logical bitstreams
   contained in the Ogg physical bitstream (see section 4).It's purpose
   is to remove codec-specific information requirements from the
   multiplexing/demultiplexing process.




Pfeiffer, et al.       Expires September 20, 2005               [Page 3]


Internet-Draft                   ANNODEX                      March 2005


   Only an Annodex bitstream that contains a CMML bitstream can be
   regarded as a Web resource and as part of the Web, because it can be
   searched and browsed.  An Ogg bitstream without a CMML bitstream is
   not an Annodex bitstream, but only an Ogg bitstream with a "skeleton"
   logical bitstream, which is still valuable as a multitrack media
   format that can be addressed through temporal hyperlinks [4], however
   it is not a first class citizen on the Web because Web search engines
   cannot index and crawl it.

   The file extension of Annodex files is ".anx".  This document also
   applies for registration of the MIME type "application/annodex" for
   Annodex format bitstreams.  In the meantime, "text/x-annodex" will be
   used.  Further MIME types that this document applies for are
   "video/annodex" for Annodex format (possibly multitrack) video and
   "audio/annodex" for Annodex format (possibly multitrack) audio.

   Please note that this document assumes that the reader understands
   the Ogg encapsulation format version 0 [3].  Also, knowledge of the
   network protocols HTTP [5] and RTP/RTSP [6] as well as the extension
   of URIs to address temporal offsets into Web resources [4] are a
   prerequisite to understanding this document.  To find out more about
   the use of Annodex for creating searchable and surfable Web
   resources, refer to the specification of the Continuous Media Markup
   Language (CMML Version 2.0) [2].



























Pfeiffer, et al.       Expires September 20, 2005               [Page 4]


Internet-Draft                   ANNODEX                      March 2005


2.  Features of Annodex

   Annodex contains interleaved bitstreams of time-related data.  It is
   designed to be used both as a persistent file format and as a
   streaming format to exchange temporally addressable bitstreams.  It
   enables encapsulation of any type of time-continuous bitstream as
   long as it is streamable and is based on a regular data sampling rate
   (called granulerate).  For variable sampling rate bitstreams, a least
   common multiple of the used sampling rates must be known.  Using this
   container format, Annodex is designed to accommodate any current or
   future compression format for time-continuous bitstreams.

   The container format that Annodex is based on is designed to allow
   several tracks of temporally synchronous time-continuous data.  Each
   track represents codec data for one type of time-continuous data
   stream.  Here is an example Annodex bitstream with data bitstreams
   D1-D3 (for example, a video track and two audio tracks) and an
   annotation track A1 (a CMML bitstream).


       __________________________________________________________________

   D1  |    |   |        |         |    |        |      |       |   |   |
       __________________________________________________________________

   D2  |          |            |            |             |          |  |
       __________________________________________________________________

   D3  |  |   |  |  |   |   |  |  |   |   |  |  |  |   |   |  |   |   | |
       __________________________________________________________________

   A1  | clip 1                       | --  | clip 2      | clip 3      |
       __________________________________________________________________

   The time axis                                                         t
       |----------------------------------------------------------------->


   Bitstreams of time-continuous data are being regarded as a sequence
   of data packets that each have a timestamp representing the time at
   which the packet data ends.  The packets contain all the data
   required to cover the interval from the last packet.  If it doesn't
   cover the full period, it MUST cover the end part of the interval.

   Bitstreams that represent data that is to be presented in one single
   time instant are called time-instantaneous bitstreams.  Their
   timestamp represents the time at which the packet's data starts and
   ends.  The CMML track A1 above is one such bitstream.  Its clips



Pfeiffer, et al.       Expires September 20, 2005               [Page 5]


Internet-Draft                   ANNODEX                      March 2005


   represent time-instantaneous data that is displayed at the given
   timestamp.  The subsequent data packet replaces the information of
   the previous one.  To insert a gap in a data bitstream (as in A1
   above), a data packet MUST be inserted which explicitly annulls the
   data.

   Data bitstreams generally contain the following information:
   o  setup information for a codec
   o  content data
   The setup information is inserted at the start of a data bitstream
   before any content data.

   Distribution of Annodex format bitstreams is performed using a
   network protocol such as HTTP [5] or RTP/RTSP [6].  The basic process
   is the following: The client dispatches a download or streaming
   request to the server with a certain URI.  The server resolves the
   URI and starts delivering Annodex format bitstreams, taking into
   account potential URI addressed offsets.  Currently the distribution
   with HTTP is clear and discussed in this document, while the details
   of a distribution via RTP/RTSP are not yet examined and thus
   unspecified - in particular a RTP payload needs to be defined for
   Annodex.

   The following figure explains the protocol stack:


       ________   _________   _________   __________
                                                       \
       | CMML |   | Video |   | Audio |   | ...    |    |
       ________   _________   _________   __________    |
                                                        |
       | skeleton                                  |     > Annodex
       _____________________________________________    |
                                                        |
       | Ogg                                       |    |
       _____________________________________________   /

       | HTTP               | RTSP                 |
       |                    _______________________|
       |
       |                    | RTP                  |
       _____________________________________________

       | TCP                | UDP                  |
       _____________________________________________

       | IP                                        |
       _____________________________________________



Pfeiffer, et al.       Expires September 20, 2005               [Page 6]


Internet-Draft                   ANNODEX                      March 2005


   The Annodex format has been designed to accommodate for reliable and
   unreliable transport.  In case of packet loss due to an unreliable
   transport, data may get lost; this may be important to the
   application or not and thus may need to be addressed.  All data,
   including CMML data, is treated with the same importance.  For
   instantaneous data tracks the loss of one packet implies that the
   next packet will restore the proper state.  We envisage, however,
   that a client may require the current state information, so there
   should be a protocol request for re-sending the current state.  This
   will be delivered by the server by inserting another copy of the
   instantaneous data into the Annodex bitstream.  For example, clips
   within an annotation bitstream can be repeated in the Annodex
   bitstream by having the same "track" attribute and the same
   page_sequence_number as the previous "clip" element.  This handling
   of unreliable transport relates mostly to the use of Annodex over
   RTP/RTSP and UDP and needs further elaboration.

   In short, the Annodex bitstream specific features are:
   o  index clips of Annodex content for retrieval, e.g.  with a Web
      search engine.
   o  crawl Webs of Annodex and other Web resources, e.g.  during an
      indexing operation of a Web search engine.
   o  directly address and retrieve temporal intervals inside the
      Annodex bitstream without a need to decode logical bitstreams
      aside from skeleton.
   o  directly address and retrieve named clips inside the Annodex
      bitstream without a need to decode any more than the skeleton and
      CMML logical bitstreams.
   o  extract, cache, and reuse temporal intervals or named clips while
      retaining the annotation and index information.
   o  browse through Webs of Annodex and other Web resources in an
      integrated manner making time-continuous content first class
      citizen on the World Wide Web.


















Pfeiffer, et al.       Expires September 20, 2005               [Page 7]


Internet-Draft                   ANNODEX                      March 2005


3.  Authoring exchange format

   For authoring of Annodex bitstream information, the CMML [2] is
   defined.  CMML's "stream" tag has been designed to author the
   skeleton bitstream and describe the data bitstreams to be interleaved
   into an Ogg bitstream.  All other tags of a CMML file provide for
   authoring of the CMML bitstream.  Use of a CMML bitstream without
   skeleton is strongly discouraged as the time referencing and clip
   recomposition functionality of Annodexing will get lost.

   An Annodex physical bitstream has the following mandatory order of
   Ogg pages:
   1.  skeleton bos page.
   2.  CMML bos page.
   3.  bos pages of the other logical bitstreams.
   4.  secondary header pages of all logical bitstreams, including
       fisbone.
   5.  skeleton eos page.
   6.  data and eos pages of logical bitstreams, excluding skeleton,
       multiplexed in a time-synchronous fashion.
   Such an Annodex bitstream is identified by the CMML bitstream's magic
   number which can be found at Byte position 104 for this version of
   the "skeleton" specification.  This is calculated through the size of
   the skeleton bos page, which is fixed because the skeleton ident
   header is of fixed size and the Ogg page encapsulation header is also
   fixed size.  The Ogg page header has 28 Bytes (including a one Byte
   segment table as this page has always less than 255 Bytes packet
   content), and the skeleton ident header has 48 Bytes (see further
   down).  Then, the Byte position amounts to 28+48+28 = 104.  The CMML
   bos page MUST thus also have less than 255 Bytes packet content,
   which is a sensible restriction.

   The CMML media mapping is defined in the CMML [2] specification.
   However, for identification of an Annodex bitstream, the bos page of
   the CMML logical bitstream needs to be identifiable, which is
   provided through the first 12 Bytes of the CMML ident packet
   containing the magic numbers and the version information: Other
   fields exists and are described in the CMML [2] specification.
   1.  Identifier: a 8 Byte field that identifies this file to be of a
       CMML logical input bitstream.  It contains the magic numbers:
          0x43 'C'
          0x4d 'M'
          0x4d 'M'
          0x4c 'L'
          0x00 '\0'






Pfeiffer, et al.       Expires September 20, 2005               [Page 8]


Internet-Draft                   ANNODEX                      March 2005


          0x00 '\0'
          0x00 '\0'
          0x00 '\0'
   2.  Version major: 2 Byte unsigned integer signifying the major
       version number of the CMML format bitstream.
   3.  Version minor: 2 Byte unsigned integer signifying the minor
       version number of the CMML format bitstream.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Identifier 'CMML\0\0\0\0'                                     | 0-3
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 4-7
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Version major                 | Version minor                 | 8-11
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | ...

































Pfeiffer, et al.       Expires September 20, 2005               [Page 9]


Internet-Draft                   ANNODEX                      March 2005


4.  The Ogg skeleton logical bitstream

   The purpose of Ogg skeleton is to provide codec-specific knowledge
   that allows parsing, demultiplexing and remultiplexing of Ogg
   bitstreams without having to decode.

   While the Ogg encapsulation format by itself is capable of
   interleaving an unlimited number of time-continuous bitstreams, it is
   not possible to identify the type of bitstreams (e.g.  audio or
   video) and their encoding format (e.g.  Vorbis or Speex or Theora)
   without decoding at least the bos page of the logical bitstreams.
   Also, further general media type information such as the image
   dimensions of a frame in a video bitstream or the language of a
   speech bitstream may be provided in skeleton.  Another limitation of
   Ogg is that each logical bitstream defines its own mapping of
   granule_position to time, which is therefore also given in the
   skeleton.

   This situation is not acceptable for Annodex, because an Annodex
   server must be able to return media format information for an Annodex
   resource without having to understand the codecs involved.  And it
   must be able to return temporal subparts of an Annodex resource
   without needing to decode.

   An addition to the Ogg format is thus necessary, which describes all
   the logical bitstreams included in the Ogg stream.  This is defined
   via a logical bitstream called the "skeleton".  For Annodex
   bitstreams, use of a skeleton bitstream is mandatory.  This section
   specifies the content of the "skeleton" logical bitstream and how it
   is mapped into Ogg.  Knowledge of the Ogg bitstream format as
   specified in the Ogg RFC [3] is presumed.  Please also refer to that
   document for descriptions of the terms used in this document.

   The skeleton bitstream has the ability to generically describe Ogg
   bitstreams that consist of one or more time-continuous data bitstream
   and one or more time-instantaneous data bitstream concurrently
   interleaved (in Ogg terms: multiplexed).  It does not describe
   sequentially multiplexed Ogg bitstreams, but rather expects that a
   sequentially multiplexed bitstream has its own skeleton logical
   bitstream.

   The skeleton logical bitstream provides the following functionality
   on top of Ogg:
   o  allows for the identification of the codec format and the content
      type of encapsulated logical bitstreams without the need to decode
      that bitstream's headers or data.
   o  allows for extraction of a temporal interval of the Ogg physical
      bitstream while retaining the original start time offset of that



Pfeiffer, et al.       Expires September 20, 2005              [Page 10]


Internet-Draft                   ANNODEX                      March 2005


      interval.
   o  allows for attachment of a real-world wall-clock time and a date
      to the Ogg physical bitstream, thus e.g.  retaining creation
      date/time or first broadcast date/time.
   o  allows for temporal offset operations into an Ogg physical
      bitstream without a need to decode any data.
   o  allows generally for handling of content without a need to decode
      it, such as is necessary in a caching Web proxy.
   o  allows for attachment of message header fields given as name-value
      pairs that contain some sort of protocol messages about the
      logical bitstream, e.g.  the screen size for a video bitstream or
      the number of channels for an audio bitstream.

   For authoring of the skeleton bitstream information the CMML [2] can
   be used.  CMML's "stream" tag has been designed with that purpose in
   mind.  However, it is not mandatory to use CMML for authoring of
   skeleton information - that information may well originate from a
   different source and be written directly into the skeleton bitstream.
   See the CMML Internet-Draft for more details.

4.1  The format of the skeleton ident header

   The skeleton logical bitstream starts with an ident header containing
   information for the complete Ogg physical bitstream.  The ident
   header has the following format:


























Pfeiffer, et al.       Expires September 20, 2005              [Page 11]


Internet-Draft                   ANNODEX                      March 2005


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Identifier 'fishead\0'                                        | 0-3
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 4-7
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Version major                 | Version minor                 | 8-11
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Presentationtime numerator                                    | 12-15
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 16-19
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Presentationtime denominator                                  | 20-23
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 24-27
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Basetime numerator                                            | 28-31
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 32-35
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Basetime denominator                                          | 36-39
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 40-43
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | UTC                                                           | 44-47
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 48-51
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 52-55
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 56-59
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 60-63
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Fields with more than one Byte length are encoded LSB (least
   significant Byte) first.

   The fields in the skeleton ident header have the following meaning:
   1.  Identifier: a 8 Byte field that identifies this bitstream as a
       skeleton.  It contains the magic numbers:
          0x66 'f'
          0x69 'i'
          0x73 's'






Pfeiffer, et al.       Expires September 20, 2005              [Page 12]


Internet-Draft                   ANNODEX                      March 2005


          0x68 'h'
          0x65 'e'
          0x61 'a'
          0x64 'd'
          0x00 '\0'
   2.  Version major: 2 Byte unsigned integer signifying the major
       version number of the skeleton bitstream.  This document
       specifies the major version 3.
   3.  Version minor: 2 Byte unsigned integer signifying the minor
       version number of the skeleton bitstream.  This document
       specifies the minor version 0.
   4.  Presentationtime numerator & denominator: 8 Byte signed integer
       each They represent together the time at which to start
       presenting the Ogg physical bitstream given as a rational number.
       The denominator represents the temporal resolution at which the
       presentationtime is given.  E.g.  5 on 1000 results in a
       presentationtime of 0.005 sec.  This enables a very high temporal
       resolution without having to store floating point numbers.  In a
       newly created physical bitstream presentationtime and basetime
       are the same.  When remultiplexing a subpart of the stream, this
       number MUST be adapted to the requested start time offset of the
       newly created stream.
   5.  Basetime numerator & denominator: 8 Byte signed integer each They
       represent together the basetime of the Ogg physical bitstream
       given as a rational number like the presentationtime.  This
       number is fixed once the physical bitstream is created and
       provides a mapping to time for the beginning of the physical
       bitstream when it starts with a granule position of 0.
   6.  UTC: a 20 Byte string containing a UTC time in the form of
       YYYYMMDDTHHMMSS.sssZ.  It associates a calendar date and a
       wall-clock time with the basetime.  It is a sequence of 20 NUL
       Bytes if not in use, making this ident packet and thus the bos
       page of the skeleton bitstream constant length.

   Please note: The possible temporal resolution of the presentation-
   and basetime is on the order of 2^-64.  For example, the time formats
   in use for media that are described in this document range from 1/24
   to 1/60 for the different smpte formats.  This resolution is enough
   for any one of these.  It is also expected to accommodate any future
   needs of time resolution for any other time format and
   time-continuously sampled data.

4.2  The format of the skeleton secondary headers

   The skeleton secondary headers are a sequence of packets that each
   contain information about one of the time-continuous or
   time-instantaneous other logical bitstreams contained within the Ogg
   physical bitstream.  A skeleton secondary header packet has the



Pfeiffer, et al.       Expires September 20, 2005              [Page 13]


Internet-Draft                   ANNODEX                      March 2005


   following format:

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Identifier 'fisbone\0'                                        | 0-3
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 4-7
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Offset to message header fields                               | 8-11
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Serial number                                                 | 12-15
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Number of header packets                                      | 16-19
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Granulerate numerator                                         | 20-23
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 24-27
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Granulerate denominator                                       | 28-31
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 32-35
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Startgranule                                                  | 36-39
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               | 40-43
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Preroll                                                       | 44-47
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Granuleshift  | Padding/future use                            | 48-51
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | Message header fields ...                                     | 52-
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


   Fields with more than one Byte length are encoded LSB (least
   significant Byte) first.

   The fields in a skeleton secondary header packet have the following
   meaning:
   1.  Identifier: a 8 Byte field that identifies this packet as a
        skeleton secondary header for identifying other logical
        bitstreams.  It contains the magic numbers:
           0x66 'f'
           0x69 'i'






Pfeiffer, et al.       Expires September 20, 2005              [Page 14]


Internet-Draft                   ANNODEX                      March 2005


           0x73 's'
           0x62 'b'
           0x6f 'o'
           0x6e 'n'
           0x65 'e'
           0x00 '\0'
   2.  Offset to message header fields: 4 Byte unsigned integer that
        contains the number of Bytes used in this packet before the
        message header fields.  For the version of the skeleton
        bitstream described in this document this number is fixed to 44.
        This field accommodates future changes to the skeleton bitstream
        allowing to parse message header fields even if more fields get
        inserted before them.
   3.  Serial number: 4 Byte unsigned integer containing the
        bitstream_serial_number of the Ogg logical bitstream described
        by this skeleton secondary header packet and thus connecting it
        to the logical bitstream.
   4.  Number of header packets: a 4 Byte unsigned integer that contains
        the number of header packets of that particular logical
        bitstream consisting of the bos page and the secondary header
        pages.
   5.  Granulerate numerator & denominator: 8 Byte signed integer each
        They represent the temporal resolution of the logical bitstream
        in Hz given as a rational number in the same way as the basetime
        attribute above.
   6.  Startgranule: 8 Byte signed integer that represents the granule
        number with which this logical bitstream starts, which is
        originally 0, but will be a positive offset when only a subpart
        of the stream is requested.
   7.  Preroll: 4 Byte unsigned integer that contains the number of
        packets to pre-roll in order to decode a current packet
        correctly.  This is for example the case with Ogg Vorbis, which
        requires a pre-roll of 2 packets.
   8.  Granuleshift: a 1 Byte unsigned integer describing whether to
        partition the granule_position into two for that logical
        bitstream, and how many of the lower bits to use for the
        partitioning.  The upper bits then still signify a
        time-continuous granule position for a directly decodable and
        presentable data granule.  The lower bits allow for
        specification of a finer resolution such that for example
        predicted frames of a video can be addressed as well, though not
        decoded without tracing back to the last fully decodable data
        granule.  This is e.g.  the case with Ogg Theora.
   9.  Padding/future use: 3 Bytes padding data that may be used for
        future requirements and are mandated to zero in this revision.
   10.  Message header fields: header fields, following the generic
        Internet Message Format defined in RFC 2822 [7].  Each header
        field consists of a name followed by a colon (":") and the field



Pfeiffer, et al.       Expires September 20, 2005              [Page 15]


Internet-Draft                   ANNODEX                      March 2005


        value.  Field names are case-insensitive.  The field value MAY
        be preceded by any amount of LWS, though a single SP is
        preferred.  Header fields can be extended over multiple lines by
        preceding each extra line with at least one SP or HT.

   There is one mandatory Message header field for all of the logical
   bitstreams: the "Content-type" header field.  For an application that
   is parsing the Annodex bitstream, this field contains the MIME type
   and the character encoding of the data in the logical bitstream.
   E.g.  for the annotation bitstream, this field will contain the value
   "Content-type: text/x-cmml; UTF-8" if the character set used in the
   CMML bitstream is UTF-8.  E.g.  for a bitstream containing Ogg Vorbis
   data the value is "Content-type: audio/x-vorbis".  The Content-type
   message header field MUST come first for all of the Message header
   fields such that it can be found at a fixed location in the skeleton
   fisbone packet.

   As per RFC 2277 [8], message header fields are considered protocol
   data, i.e.  it is not expected to have human readable text in there,
   and they MUST be entirely encoded in UTF-8.  In addition, the
   mandatory header fields MUST be encoded in US-ASCII and it is
   recommended to also use US-ASCII code points as much as possible for
   the optional header fields.

   User defined optional message header fields MUST follow the naming
   standard given in RFC2822.

4.3  Media mapping of skeleton into Ogg

   The media mapping for skeleton into Ogg is as follows:
   o  The skeleton ident (fishead) header is mapped into the skeleton
      bos page.
   o  The secondary header pages of a skeleton logical bitstream consist
      of the fisbone header packets that each describe one particular
      logical data bitstream within the Ogg physical bitstream.
   o  There are no content pages or data packets.  As the skeleton eos
      page is included before the first data page of any logical
      bitstream, there actually cannot be any content data packets.
   o  The skeleton eos page contains one packet of length zero.

   When using a skeleton logical bitstream in Ogg, a further restriction
   on the order in which Ogg pages appear is introduced to allow for
   easier identification:
   1.  The skeleton bos page is the very first bos page.  This allows
       its differentiation from other Ogg bitstreams that don't contain
       a skeleton logical bitstream.
   2.  The bos pages of the other logical bitstreams come next as is a
       requirement of the Ogg bitstream format.



Pfeiffer, et al.       Expires September 20, 2005              [Page 16]


Internet-Draft                   ANNODEX                      March 2005


   3.  The secondary header pages of all the logical bitstreams in the
       Ogg physical bitstream come next, as is also a requirement of
       Ogg.  The skeleton secondary header pages are also included here.
   4.  Before any data pages of any of the logical bitstreams appear in
       the Ogg physical bitstream, the skeleton eos page MUST end the
       skeleton logical bitstream.  This is necessary to end the control
       section of the bitstream.  If an Ogg stream parser reaches the
       skeleton eos page, it knows that it has received all the bos and
       secondary header pages and can start setting up its decoding or
       parsing environment.









































Pfeiffer, et al.       Expires September 20, 2005              [Page 17]


Internet-Draft                   ANNODEX                      March 2005


5.  Handling time in Annodex format bitstream

   With time-continuous data like Annodex, one needs to handle data at
   four different levels:
   o  at the Bytes level, upon seeking.
   o  at the packets level, upon encapsulating.
   o  at the granules level, upon recomposing.
   o  at the time level, upon displaying and addressing.
   This section explains how they all fit together.

5.1  Conceptual overview

   Annodex bitstreams inherently represent one timeline only, where the
   different logical bitstreams can be thought of as content tracks on
   that timeline.  All of these tracks relate to the same timeline which
   starts at a certain time point and ends when the last bitstream ends.

   An example bitstream can be seen in the following figure.  It
   consists of an Annodex bitstream that contains 4 media bitstreams and
   one CMML bitstream.  The picture is a conceptual representation of
   the time intervals covered by the different logical bitstreams and
   the Ogg pages used to encapsulate the data.  In the flat
   representation these are multiplexed such that the data packets of
   each of these bitstreams occur at the correct time.


                                t_url
                                  |
   t_0                            v                                      t_n
   |------------------------------------------------------------------->|
   ----------------------------------------------------------------------
   |clip1  | clip 2 |/clip 3///////////////| clip 4                     |
   ----------------------------------------------------------------------
   CMML bitstream

   ----------------------------------------------
   |  |  |  |  |  |  |  |  |  |  |//|  |  |  |  |
   ----------------------------------------------
   audio bitstream 1
           -------------------------------------------------------------
           |     |     |     |/////|     |     |     |     |     |     |
           -------------------------------------------------------------
           video bitstream 1
                    ----------------------------------------------------
                    |  |  |  |  |//|  |  |  |  |  |  |  |  |  |  |  |  |
                    ----------------------------------------------------
                    audio bitstream 2
                           -------------------------------



Pfeiffer, et al.       Expires September 20, 2005              [Page 18]


Internet-Draft                   ANNODEX                      March 2005


                           |     |/////|     |     |     |
                           -------------------------------
                           video bitstream 2

   The time point at which an Annodex bitstream starts (t_0 in the above
   diagram) is called the "basetime" and represents the time in seconds
   associated with the granule position of 0 on all logical bitstreams.
   Typically, a newly created Annodex file starts all its logical
   bitstreams at granule position 0, and a typical extract of an Annodex
   bitstream, such as the one starting at t_url in the image above,
   starts each of its logical bitstreams at a different granule
   positions.  These granule positions are stored in the "startgranule"
   field of the skeleton secondary header packets.

   The "basetime" of an Annodex bitstream may be 0, but it can also be
   any positive time.  For example, in professional video production,
   the first frame of video of a program normally refers to a SMPTE
   basetime of 01:00:00:00, not 00:00:00:00 (see also the temporal URI
   addressing [4] specification).  Associating such a practice to a
   digital video resource requires a way to store that basetime with the
   resource and interpreting it correctly when addressing offsets such
   as t_uri.  Annodex provides such a mapping through the basetime field
   in the skeleton ident header.

   Also associated with the basetime is a calendar date and wall-clock
   time (a "UTC base") which represent a real-world time giving some
   meaningful calendar date association to the content such as the
   creation time or the first presentation time.  The UTC base is
   specified in the UTC field of the skeleton ident header.

5.2  Mapping a granule position to a time position

   Each one of the encapsulated data bitstreams and the CMML bitstream
   have their own temporal resolution at which they provide data to
   cover the given timeline.  This temporal resolution is usually given
   through the sampling rate of the particular bitstream.  For example,
   a raw audio bitstream at CD quality is sampled with a sampling rate
   of 44100 Hz.  A video bitstream may be sampled with a frame rate of
   25 frames per second.

   This temporal resolution is called the "granulerate".  A granule is a
   data element that is based on a regular data rate specific to the
   content type, such as the frame rate for video or the sampling rate
   for audio.  It even exists for bitstreams that are not sampled at a
   regular rate - then it is the highest resolution of any of the used
   sampling rates.  The granulerate is specified in the skeleton
   secondary header packets for each logical bitstream.




Pfeiffer, et al.       Expires September 20, 2005              [Page 19]


Internet-Draft                   ANNODEX                      March 2005


   Each one of the bitstreams insert data into the Ogg bitstream through
   packets which have an associated temporal duration based on the
   encoder packaging.  Packets are packaged into Ogg pages, which have a
   granule position associated with them.  Not taking the special case
   of a granuleshift into account, the granule position specifies the
   number of granules that has been encapsulated since the implicit
   start of the original bitstream until and including the given Ogg
   page.

   The granule position together with the granulerate and granuleshift
   information of the skeleton secondary header packets for the
   particular logical bitstream are used for the calculation of the time
   position for which a data packet of the logical bitstream completes
   data.  A granule position of -1 indicates a special case and MUST NOT
   be used for calculation of a mapping to time.

   In principle, the granule position of an Ogg page divided by the
   granulerate of this page's logical bitstream provides the time
   position that is reached in that bitstream after decoding all data
   packets finished on this page.  However, the granule_position field
   in an Ogg page allows for a more fine-grained description of the
   temporal position.  The following image explains the composition of
   the granule_position field in an Ogg page:

           granule_position
           ------------------------------------------------
           |  keyindex               |  keyoffset         |
           ------------------------------------------------

   The granuleshift field of the skeleton secondary header packets
   describes how many of the granule_position's 64 bits are being used
   for the keyoffset.  The keyoffset part of the granule_position is
   commonly used when the logical bitstream consists of packets that can
   only be fully decoded when referring back to a previous packet.  For
   example, video streams often consist of inter and intra coded frames,
   where the intra frames are fully decodable and the inter frames are
   intermediate frames that require backtracking to the last inter frame
   for accurate decoding.  Another example is a logical bitstream that
   is mapped as instantaneous information (i.e.  their granuleposition
   represents the start time and the end time of the packet data), but
   actually has a duration associated to it, which is provided through a
   subsequent packet.  CMML is such an example.  The keyindex part of
   the granule_position is then used to provide the temporal position of
   the reference packet and the keyoffset part provides a counter for
   the data in between.

   The calculation of the temporal position of an Ogg page in Annodex is
   thus specified through the following algorithm:



Pfeiffer, et al.       Expires September 20, 2005              [Page 20]


Internet-Draft                   ANNODEX                      March 2005


   t_page = basetime + ((keyindex + keyoffset) / granulerate)

   The basetime provides the time offset used at the beginning of the
   logical bitstream for the first data packet and thus MUST be added
   for a correct calculation of the temporal position.

   As an example regard an audio bitstream that has a granulerate of
   44100 (i.e.  44100 samples per 1 sec), a granuleshift of 0, and
   starts at 4 sec.  When reaching a granule_position of 88200, this
   maps to a time position of 6 seconds:

   t_page = 4 + ((88200 + 0) / 44100) = 6

   This signifies that the bitstream has reached the second sec of the
   audio bitstream after the end of decoding this page's packets, but
   maps to 6 seconds because of the basetime.

   As another example consider a video bitstream that has a granulerate
   of 25 (i.e.  25 frames per 1 second), a granuleshift of 3 (because it
   encodes - say - 7 partial frames between each fully encoded frame),
   and starts at 0 sec.  When reaching a granule_position of 997, i.e.
   a keyindex of 62 and a keyshift of 5, this maps to a fully decodable
   time position of 2.68 seconds:

   t_page = 0 + ((62 + 5) / 25) = 2.68 sec

   The granulerate of a time-instantaneous bitstream such as the CMML
   bitstream can be chosen arbitrarily by the bitstream multiplexer.
   Per default, a granulerate of 1000 is used, which is the resolution
   of npt.  The resolution of all the time schemes is given as:
   o  npt: 1000 (milliseconds)
   o  smpte-24: 24 (24 fps)
   o  smpte-24-drop: 24/1.001 = 23.976 (approx.  as per SMPTE)
   o  smpte-25: 25
   o  smpte-30: 30
   o  smpte-30-drop: 30/1.001 = 29.970 (approx.  as per SMPTE)
   o  smpte-50: 50
   o  smpte-60: 60
   o  smpte-60-drop: 60/1.001 = 59.940 (approx.  as per SMPTE)

   The granule position of the page finishing data of a
   time-instantaneous bitstream packet MUST signify the start time of
   that packet.  For example, a CMML bitstream with a granulerate of
   1000, a basetime of 0, and a clip that lasts from npt=12.020 till
   npt=15.0 will get a granule_position of 12020.  In contrast, the
   granule_position of the page finishing data of e.g.  an audio
   bitstream with granulerate 44100, basetime 0 and containing data from
   npt=12.020 to npt=15.0 will be 661500.



Pfeiffer, et al.       Expires September 20, 2005              [Page 21]


Internet-Draft                   ANNODEX                      March 2005


   A note about field overflows: an overflow of the granule position
   field can destroy the temporal integrity of the Annodex physical
   bitstream.  In this case, a multiplexer MUST end the Annodex physical
   bitstream and restart a new one resetting the counter to 0 and
   adjusting the basetime appropriately.  This is also called sequential
   multiplexing in Ogg.  The same measure MUST be taken in case of an
   overflow of the page_sequence_number on one of the logical
   bitstreams.

5.3  Addressing/seeking into the bitstream

   Addressing into an Annodex bitstream is possible with the temporal
   URI addressing [4] scheme.  Time is specified as a temporal offset
   from the "beginning" of the stream, making use of the basetime field.
   Time offsets can also be specified as calendar dates and times.  The
   UTC base is then used as a basis for offsetting.

   The basetime allows to correctly map a temporal offset point such as
   a temporal URI to a Byte position in the stream.  In the above figure
   take t_uri=npt:14.0 as the temporal offset addressed on a stream with
   t_0=npt:5.0 as the basetime - this requires a stream offsetting of
   only 9 sec to the appropriate granule position in each of the
   bitstreams, in the figure marked through patterned pages.

   The seeking action is performed on the interleaved bitstream, in
   which, the data packets occur in a temporally consecutive order based
   on the time at which their data ends.  These times are represented in
   the granule positions of the Ogg pages, which are only allowed to
   monotonically increase within one logical bitstream.  This implies
   that when having found an Ogg page with a granule position that maps
   to a given seek time (i.e.  covers the time or ends at it), the seek
   has found the right location.  This applies over all logical
   bitstreams.  In the above example, this means that the Byte position
   of the first occurring page of the patterned pages has been found.

   There is a complication to the seeking: some logical bitstreams have
   backwards dependencies in their data packets and these have to be
   taken into account for seeking.  For example, a logical bitstream may
   require several of its previous packets to allow a correct and
   complete decoding of the actual packet that occurs at the seektime.
   This is the case for Theora which requires to go back to the previous
   keyframe when decoding from a time offset.  It is also the case for
   Vorbis which requires the previous 2 packets for accurate setup of
   the frequency transform - Speex needs approximately 2 packets for
   similar reasons.  Even instantaneous bitstreams such as CMML may
   require to go back to a previous packet to recover the last state
   information - the currently active clip in the case of CMML.




Pfeiffer, et al.       Expires September 20, 2005              [Page 22]


Internet-Draft                   ANNODEX                      March 2005


   Therefore, once seeking has located the correct Byte position that
   refers to the given temporal offset, it MUST seek back.  For logical
   bitstreams that have a non-zero "granuleshift" in the skeleton, it
   MUST seek back to the Ogg page that has a "keyindex" granule
   position.  For logical bitstreams that have a non-zero "preroll" in
   the skeleton, it MUST seek back that many packets.  The earliest Byte
   position that satisfies all these requirements is the correct seek
   position.

   A player that presents from an offset MUST take into account that the
   bitstream may contain some packets that are only there to allow
   accurate decoding of the seek time.  When the backwards dependencies
   were resolved for a specific logical bitstream, several non-relevant
   Ogg pages of may also have ended up in the intermediate.  These have
   to be skipped by a player.  The time that a player MJST start
   presenting from is given in the "presentationtime" in the skeleton
   ident header.

5.4  Remultiplexing a bitstream

   When a subpart of an Annodex bitstream is requested, such as through
   a temporal URI query request from a Web server, the bitstream MUST be
   recomposed and a remultiplexed bitstream served out.  There are
   several aims for performing the remultiplexing with as little effort
   and therefore as little delay as possible:
   o  no decoding of the logical bitstreams is performed.
   o  no changes to the pages, in particular to the granule positions
      are made.
   o  changes occur only to the control section.

   The fields of the skeleton track allow achievement of all these aims.
   Remultiplexing is essentially achieved by seeking to the position as
   described above and then including from each logical bitstream only
   the relevant Ogg pages into the new stream.  Changes to fields in the
   bitstream are restricted to the control section:
   o  the "presentationtime" MUST be adjusted to the requested start
      time
   o  the "startgranule" for each logical bitstream MUST be adjusted to
      the granule position at which each logical bitstream starts.  This
      is not the first granule position of the Ogg pages included into
      the bitstream, but rather the last one that did not get included,
      as it represents the start time of the bitstream.
   Everything else, and in particular the Ogg pages, stay the same.
   This is important also to allow caching of such files as is required
   for Web proxies and described in temporal URI addressing [4].






Pfeiffer, et al.       Expires September 20, 2005              [Page 23]


Internet-Draft                   ANNODEX                      March 2005


6.  MIME media type applications

6.1  MIME media type registration for 'application/annodex'

   This section contains the registration information for the
   "application/annodex" media type.  While this media type is not
   approved by the IANA, "application/x-annodex" may be used.

   To: ietf-types@iana.org

   Subject: Registration of MIME media type application/annodex

   MIME media type name: application

   MIME subtype name: annodex

   Required parameters: none

   Optional parameters: none

   Encoding Considerations: Annodex is an exchange format for any type
   of encoded time-continuously sampled data stream.  The authoring
   software MUST provide for the encoders, providing the MIME type (and
   potentially the charset for text-based formats) in the "Content-type"
   Message header field of each bitstream.  The client software can
   select an appropriate decoder based on this information.

   Security considerations: see next section.

   Interoperability considerations: the Annodex bitstream format is a
   free specification that is independent of any media encoding format.
   It is designed to provide interoperability with the existing World
   Wide Web.

   Additional information:
      Magic numbers: "OggS" identifies an Ogg page at Byte position 0,
      "fishead\0" identifies a skeleton logical bitstream at Byte
      position 28.  In the second Ogg page at Byte position 28 the magic
      number "CMML\0\0\0\0" can be found, identifying this as an Annodex
      bitstream.
      File extension: .anx
      Macintosh File Type Code: "ANDX"
      Intended usage: COMMON

6.1.1  URI addressing into Annodex bitstreams

   As Annodex bitstreams are time-continuous Web resources, hyperlinking
   into Annodex bitstreams via URIs is possible with the temporal URI



Pfeiffer, et al.       Expires September 20, 2005              [Page 24]


Internet-Draft                   ANNODEX                      March 2005


   query and fragment specification [4].  For the query case, an Annodex
   server must supports the "X-Accept-TimeURI" http header field (see
   the temporal URI query specification [4] for more details).  The
   "X-Accept-Range-Redirect" and "X-Range-Redirect" http header fields
   MAY also be supported by an Annodex server and user agent.

   As Annodex bitstreams contain CMML logical bitstreams, URI addressing
   of clips via their name given in the "id" tag is also supported.  The
   same mechanisms as specified in the CMML specification [2] apply to
   Annodex analogously.  In particular, the id addressing is also
   regarded as an alias for a time offset and an Annodex conformant
   server that supports Annodex temporal URI addressing MUST also
   support named URI addressing (see the CMML specification [2] for more
   details).

   Examples for valid URI addresses:
   o  http://example.com/sample.anx?t=npt:4 , which relates to an
      Annodex bitstream composed by the server from sample.anx by
      starting it at an offset of 4 seconds.
   o  http://example.com/sample.anx?id=dolphin --- relates to the clip
      whose id attribute value is "dolphin" and all further clips after
      that.
   o  http://example.com/sample.anx?id="dolphin/" --- relates only to
      the clip whose id attribute value is "dolphin".
   o  http://example.com/sample.anx?id="intro/goldfish" --- realtes to
      all the clips from the "intro" clip to the "goldfish" clip.
   o  http://example.com/sample.anx#t=npt:4 --- start using the Annodex
      bitstream from a 4 second offset.
   o  http://example.com/sample.anx#dolphin -- use the clip with
      id="dolphin" only.

6.1.2  HTTP 'Accept' header field interpretation

   The Annodex and the CMML file that can be extracted from it are very
   tightly related to each other: the CMML file contains all annotation
   and indexing information including basetime and UTC time about the
   Annodex file.  Therefore, receiving the CMML file instead of the
   Annodex file is like receiving all information about the bitstreams
   in the Annodex file except for the data bitstreams themselves.

   This situation can be taken advantage of with the "Accept" header of
   HTTP.  When an Annodex file is requested from a HTTP server and the
   acceptable content types given in the "Accept" message header field
   contains "text/x-cmml" with a higher priority than
   "application/x-annodex", then the HTTP server SHOULD return the CMML
   file instead of the requested Annodex file itself.  As is standard,
   the HTTP response will contain a "Content-type" field indicating what
   content was actually returned.  A Web crawler of a search engine,



Pfeiffer, et al.       Expires September 20, 2005              [Page 25]


Internet-Draft                   ANNODEX                      March 2005


   e.g., can thus avoid extra network load and retrieve more easily
   parsable information.  It SHOULD set the "Accept" HTTP header to
   "Accept: text/x-cmml" for every requested Annodex URI.  For example:

   Accpet: text/x-cmml; q=1, application/x-annodex; q=0.5


6.2  MIME media type registration for 'video/annodex'

   This section contains the registration information for the
   "video/annodex" media type.  While this media type is not approved by
   the IANA, "video/x-annodex" may be used.

   To: ietf-types@iana.org

   Subject: Registration of MIME media type "video/annodex"

   MIME media type name: video

   MIME subtype name: annodex

   Required parameters: none

   Optional parameters: none

   Encoding Considerations: Annodex video is a subclass of Annodex data
   where there is at least on video track encpsulated together with the
   skeleton and CMML tracks, and a potentially unlimited number of other
   audio and video tracks.

   Security considerations: as in "application/annodex" MIME
   application.

   Interoperability considerations: as in "application/annodex" MIME
   application.

   Additional information:
      Magic numbers: as in "application/annodex" MIME application.
      File extension: .axv
      Macintosh File Type Code: "ANXV"
      Intended usage: COMMON
      URI addressing and HTTP header field use of "application/annodex"
      type content apply analogously to "video/annodex".

6.3  MIME media type registration for 'audio/annodex'

   This section contains the registration information for the
   "audio/annodex" media type.  While this media type is not approved by



Pfeiffer, et al.       Expires September 20, 2005              [Page 26]


Internet-Draft                   ANNODEX                      March 2005


   the IANA, "audio/x-annodex" may be used.

   To: ietf-types@iana.org

   Subject: Registration of MIME media type "audio/annodex"

   MIME media type name: audio

   MIME subtype name: annodex

   Required parameters: none

   Optional parameters: none

   Encoding Considerations: Annodex audio is a subclass of Annodex data
   where there is at least on audio track encpsulated together with the
   skeleton and CMML tracks, and a potentially unlimited number of other
   audio tracks.

   Security considerations: as in "application/annodex" MIME
   application.

   Interoperability considerations: as in "application/annodex" MIME
   application.

   Additional information:
      Magic numbers: as in "application/annodex" MIME application.
      File extension: .axa
      Macintosh File Type Code: "ANXA"
      Intended usage: COMMON
      URI addressing and HTTP header field use of "application/annodex"
      type content apply analogously to "audio/annodex".



















Pfeiffer, et al.       Expires September 20, 2005              [Page 27]


Internet-Draft                   ANNODEX                      March 2005


7.  Security considerations

   Annodex format bitstreams contain several multiplexed binary media
   and one XML annotation bitstream.  There is no generic encryption or
   signing mechanism provided for the complete bitstream or anyone of
   its parts.  As the format of the encapsulated media bitstreams is not
   prescribed and is identified through the "Content-type" Message
   header field in that bitstream's skeleton secondary header packet, it
   is possible to encrypt or sign that media bitstream and then mark it
   accordingly with a MIME type that signifies the encryption.  It is up
   to the applications that use this bitstream to provide an appropriate
   codec to handle such bitstreams.

   As Annodex format bitstreams contain binary media bitstreams, it is
   possible to include executable content in them.  This can be an issue
   with applications that decode these bitstreams, especially when they
   are used in a network scenario.  Such applications MUST ensure
   correct handling of manipulated bitstreams, of buffer overflow and
   the like.
































Pfeiffer, et al.       Expires September 20, 2005              [Page 28]


Internet-Draft                   ANNODEX                      March 2005


8.  ChangeLog

   draft-pfeiffer-annodex-01:
   o  Annodex version 2.0: changes because of renamings of CMML tags and
      changes to the temporal and named URI addressing.

   draft-pfeiffer-annodex-02:
   o  Annodex version 3.0: The changes pertain to the bitstream format
      to allow for a stronger decoupling of Annodex and CMML.  The
      Annodex format is now using the Ogg format with a "skeleton" and a
      "CMML" logical bitstream.  This change has reinforced a layered
      approach that fits better with existing practice in Internet
      protocols, where each layer solves a specific problem without
      being dependent on other layers further up.

9.  References

   [1]   World Wide Web Consortium, "HTML 4.01 Specification", W3C HTML,
         December 1999, <http://www.w3.org/TR/html4/>.

   [2]   Pfeiffer, S., Parker, C. and A. Pang, "The Continuous Media
         Markup Language (CMML), Version 2.0 (work in progress)",
         I-D draft-pfeiffer-cmml-02.txt, March 2005,
         <http://www.annodex.net/TR/cmml.txt>.

   [3]   Pfeiffer, S., "The Ogg encapsulation format version 0",
         RFC 3533, May 2003, <http://www.ietf.org/rfc/rfc3533.txt>.

   [4]   Pfeiffer, S., Parker, C. and A. Pang, "Specifying time
         intervals in URI queries and fragments of time-based Web
         resources (work in progress)",
         I-D draft-pfeiffer-temporal-fragments-03.txt, March 2005,
         <http://www.annodex.net/TR/URI_fragments.txt>.

   [5]   Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L.,
         Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol --
         HTTP/1.1", RFC 2616, June 1999,
         <http://www.ietf.org/rfc/rfc2616.txt>.

   [6]   Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming
         Protocol (RTSP)", RFC 2326, April 1998,
         <http://www.ietf.org/rfc/rfc2326.txt>.

   [7]   Resnick, P., "Internet Message Format", RFC 2822, April 2001,
         <http://www.ietf.org/rfc/rfc2822.txt>.

   [8]   Alvestrand, H., "IETF Policy on Character Sets and Languages",
         RFC 2277, January 1998, <http://www.ietf.org/rfc/rfc2277.txt>.



Pfeiffer, et al.       Expires September 20, 2005              [Page 29]


Internet-Draft                   ANNODEX                      March 2005


   [9]   Bradner, S., "Key words for use in RFCs to Indicate
         Requirements Levels", RFC 2119, BCP 14, March 1997.

   [10]  World Wide Web Consortium, "Extensible Markup Language (XML)
         1.0", W3C XML, October 2000,
         <http://www.w3.org/TR/2000/REC-xml-20001006>.

   [11]  World Wide Web Consortium, "XHTML(TM) 1.0 The Extensible Hyper
         Text Markup Language", W3C XHTML, January 2000,
         <http://www.w3.org/TR/xhtml1/>.

   [12]  Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
         Resource Identifiers (URI): Generic Syntax", RFC 3986, January
         2005, <http://www.ietf.org/rfc/rfc3986.txt>.

   [13]  Alvestrand, H., "Tags for the Identification of Languages",
         RFC 1766, March 1995, <http://www.ietf.org/rfc/rfc1766.txt>.

   [14]  Freed, N. and N. Borenstein, "Multipurpose Internet Mail
         Extensions (MIME) Part Two: Media Types", RFC 2046, November
         1996, <http://www.ietf.org/rfc/rfc2046.txt>.

   [15]  Whitehead, E. and M. Murata, "XML Media Types", RFC 2376, July
         1998, <http://www.ietf.org/rfc/rfc2376.txt>.

   [16]  The Society of Motion Picture and Television Engineers, "SMPTE
         STANDARD for Television, Audio and Film - Time and Control
         Code", ANSI 12M-1999, September 1999.

   [17]  ISO, TC154., "Data elements and interchange formats --
         Information interchange -- Representation of dates and times",
         ISO 8601, 2000.


Authors' Addresses

   Silvia Pfeiffer
   Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia
   PO Box 76
   Epping, NSW  1710
   Australia

   Phone: +61 2 9372 4180
   Email: Silvia.Pfeiffer@csiro.au
   URI:   http://www.ict.csiro.au/






Pfeiffer, et al.       Expires September 20, 2005              [Page 30]


Internet-Draft                   ANNODEX                      March 2005


   Conrad D. Parker
   Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia
   PO Box 76
   Epping, NSW  1710
   Australia

   Phone: +61 2 9372 4222
   Email: Conrad.Parker@csiro.au
   URI:   http://www.ict.csiro.au/


   Andre T. Pang
   Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia
   PO Box 76
   Epping, NSW  1710
   Australia

   Phone: +61 2 9372 4222
   Email: Andre.Pang@csiro.au
   URI:   http://www.ict.csiro.au/































Pfeiffer, et al.       Expires September 20, 2005              [Page 31]


Internet-Draft                   ANNODEX                      March 2005


Appendix A.  Definitions of terms and abbreviations

   Time-continuously sampled data: any sequence of binary data that
      represents an analog-time signal sampled in discrete time steps.
      In contrast to actual discrete-time signals as known from signal
      processing, time-continuously sampled data may also come in
      compressed form, such that a block of numbers represents an
      interval of time.
   Time-instantaneous bitstream: a time-continuously sampled data stream
      where the components provide information for a specific
      time-instant.
   Time-continuous bitstream: a time-continuously sampled data stream
      where the components provide ongoing information as time goes by.
   Clip: a temporal section of a time-continuous data stream.
   Annotation: a free-text, unstructured description of a clip.
   Metadata: a name-value pair that provides a structured, database-like
      description of the content.
   Hyperlink: a Unified Resource Identifier (URI).
   Meta information: collection of information about a data stream,
      which may include annotations, hyperlinks, and metadata.
   Fragment: a subpart of a media document covering some temporal
      interval.
   Mark-up: XML tags and their content used to describe a media
      document.
   Annodex bitstream: encapsulated time-continuous bitstream with head
      and clip elements.
   Annotating: the task of giving textual descriptions to fragments of
      media documents.
   Indexing: the task of identifying index points for media documents or
      fragments thereof.
   Hyperlinking: the task of linking from one Web resource to another.
      If a link has an offset into the resource, this is sometimes
      called deep hyperlinking.
   head element: CMML data containing information on an Annodexed media
      file.
   media packet: a block of digital data that represents a temporal
      subpart of a stream of continuous media.  Media packets of one
      continuous media file do not overlap in time.
   bitstream: a sequence of time-continuous data.












Pfeiffer, et al.       Expires September 20, 2005              [Page 32]


Internet-Draft                   ANNODEX                      March 2005


Appendix B.  Glossary of acronyms

   CMML: Continuous Media Markup Language.
   DTD: Document Type Declaration.
   XML: eXtensible Markup Language.
   CMWeb: Continuous Media Web.
   Web: World Wide Web.
   URI: Unified Resource Identifier.











































Pfeiffer, et al.       Expires September 20, 2005              [Page 33]


Internet-Draft                   ANNODEX                      March 2005


Appendix C.  Acknowledgments

   The authors greatly acknowledge the contributions of Rob Collins,
   Zentaro Kavanagh, Andrew Nesbit and Simon Lai in developing this
   specification.














































Pfeiffer, et al.       Expires September 20, 2005              [Page 34]


Internet-Draft                   ANNODEX                      March 2005


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Copyright Statement

   Copyright (C) The Internet Society (2005).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.




Pfeiffer, et al.       Expires September 20, 2005              [Page 35]