Ogg Skeleton
draft-swhited-ogg-skeleton-01
This document is an Internet-Draft (I-D).
Anyone may submit an I-D to the IETF.
This I-D is not endorsed by the IETF and has no formal standing in the
IETF standards process.
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Author | Sam Whited | ||
| Last updated | 2026-04-13 | ||
| RFC stream | (None) | ||
| Intended RFC status | (None) | ||
| Formats | |||
| Additional resources |
Other Repository
Issue Tracker Related Implementations Wiki |
||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-swhited-ogg-skeleton-01
Internet Engineering Task Force ssw. Whited, Ed.
Internet-Draft 13 April 2026
Intended status: Informational
Expires: 15 October 2026
Ogg Skeleton
draft-swhited-ogg-skeleton-01
Abstract
Ogg Skeleton defines a logical bitstream that provides structuring
information for multitrack Ogg files. It provides clues for
synchronization and content negotiation including language selection.
It also provides keypoint indices for optimal seeking over high-
latency connections or in time-critical scenarios.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 15 October 2026.
Copyright Notice
Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
Whited Expires 15 October 2026 [Page 1]
Internet-Draft skeleton April 2026
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2. The skeleton logical bitstream . . . . . . . . . . . . . . . 3
2.1. Ident Header . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Secondary Header . . . . . . . . . . . . . . . . . . . . 7
2.2.1. Message Header Fields . . . . . . . . . . . . . . . . 10
2.3. Keypoint Index Packets . . . . . . . . . . . . . . . . . 11
2.3.1. Keypoints . . . . . . . . . . . . . . . . . . . . . . 12
3. Ogg Media Mapping . . . . . . . . . . . . . . . . . . . . . . 13
4. Time Handling . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1. Conceptual Overview . . . . . . . . . . . . . . . . . . . 15
4.2. Mapping a granule position to a time position . . . . . . 16
4.3. Seeking into the bitstream . . . . . . . . . . . . . . . 19
4.4. Remultiplexing an Ogg Bitstream . . . . . . . . . . . . . 20
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
6. Security Considerations . . . . . . . . . . . . . . . . . . . 21
7. Normative References . . . . . . . . . . . . . . . . . . . . 21
8. Informative References . . . . . . . . . . . . . . . . . . . 22
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 22
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 23
1. Introduction
Ogg [RFC3533] is a generic container format, enabling interleaving of
several tracks of frame-wise encoded content in a time-multiplexed
manner. As an example, an Ogg physical bitstream could encapsulate
several tracks of video encoded in [Theora] and multiple tracks of
audio encoded in Opus [RFC6716] or FLAC [RFC9639] at the same time.
A player that decodes such a bitstream could then play one video
channel as the main video playback, alpha-blend another one on top of
it (e.g. a caption track), play a main Opus audio track together with
several FLAC audio tracks simultaneously (e.g. as sound effects), and
provide a choice of Opus channels providing commentary in different
languages. Such a file is generally possible to create with Ogg, it
is however not possible to generically parse such a file, seek on it,
understand what codecs are contained in such a file, and dynamically
handle and play back such content.
Whited Expires 15 October 2026 [Page 2]
Internet-Draft skeleton April 2026
Ogg does not know anything about the content it carries and leaves it
to the media mapping of each codec to declare and describe itself.
There is no meta information available at the Ogg level about the
content tracks encapsulated within an Ogg physical bitstream. This
is particularly a problem if you want to parse an Ogg file to find
out what type of data it encapsulates without having access to the
codec used to decode that data, or want to seek to a temporal offset
without having to decode the data first (such as on a Web server that
serves media).
Skeleton is designed to overcome these problems. Skeleton is a
logical bitstream within an Ogg stream that contains information
about the other encapsulated logical bitstreams. For each logical
bitstream it provides information such as its media type, and
explains the way that Ogg pages are mapped to time.
Seeking in an Ogg file is typically implemented as a bisection search
for the seek target timestamp. However when seeking over a high
latency connection, such as the internet, such searches can be slow.
Some bitstreams, such as video streams, have keyframes. In order to
seek to a given temporal offset in a video stream, you must first
perform a bisection search to find the target frame, determine its
keyframe, and then perform another bisection search to locate that
keyframe and decode forwards to the temoporal offset. This can be
very slow. Ogg Skeleton provides an index of keyframes, or an index
of periodic samples on streams without the concept of a keyframe,
enabling seeking over high-latency connections optimally with "one
hop".
Ogg Skeleton is also designed to allow the creation of substreams
from Ogg physical bitstreams that retain the original timing
information. For example, when cutting out the segment between the
7th and the 59th second of an Ogg file, it is ideal to continue to
start this cut out file with a playback time of 7 seconds and not of
0. This is of particular interest if you're streaming this file from
a web server after a query for a temporal subpart such as in
http://example.com/video.ogv?t=7-59 .
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
2. The skeleton logical bitstream
Whited Expires 15 October 2026 [Page 3]
Internet-Draft skeleton April 2026
2.1. Ident Header
The skeleton logical bitstream starts with an ident header containing
information for the complete Ogg physical bitstream. The ident
header has the following format:
Whited Expires 15 October 2026 [Page 4]
Internet-Draft skeleton April 2026
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier 'fishead\0' | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version major | Version minor | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Presentationtime numerator | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Presentationtime denominator | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Basetime numerator | 28-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Basetime denominator | 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 40-43
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| UTC | 44-47
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 48-51
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 52-55
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 56-59
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 60-63
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Segment length | 64-67
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 68-71
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Content byte offset | 72-75
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 76-79
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1: Ident header packet layout
Fields with more than one byte length MUST be encoded Least
Significant Byte (LSB) first byte order.
Whited Expires 15 October 2026 [Page 5]
Internet-Draft skeleton April 2026
The fields in the skeleton ident header have the following meaning:
Identifier An 8 byte field that identifies this bitstream as Ogg
Skeleton. It contains the magic numbers:
* 0x66 'f'
* 0x69 'i'
* 0x73 's'
* 0x68 'h'
* 0x65 'e'
* 0x61 'a'
* 0x64 'd'
* 0x00 '\0'
Version major 2 byte unsigned integer signifying the major version
number of the skeleton bitstream. This document specifies the
major version 4.
Version minor 2 byte unsigned integer signifying the minor version
number of the skeleton bitstream. This document specifies the
minor version 0.
Presentationtime numerator & denominator 8 byte unsigned integer
each. Together they represent the time at which to start
presenting the Ogg physical bitstream given as a rational number.
The denominator represents the temporal resolution at which the
presentationtime is given. E.g. 5/1000 results in a
presentationtime of 0.005 sec. This enables a very high temporal
resolution without having to store floating point numbers. In a
newly created physical bitstream presentationtime and basetime are
the same. When remultiplexing a subpart of the stream, this
number MUST be adapted to the requested start time offset of the
newly created stream. Presentationtime MUST be larger than or
equal to zero.
Basetime numerator & denominator 8 byte signed integer each.
Together they represent the basetime of the Ogg physical bitstream
given as a rational number like the presentationtime. This number
is fixed once the physical bitstream is created and provides a
mapping to time for the beginning of the physical bitstream when
it starts with a granule position of 0.
Whited Expires 15 October 2026 [Page 6]
Internet-Draft skeleton April 2026
UTC A 20 byte string containing a UTC time in the form of
YYYYMMDDTHHMMSS.sssZ [ISO.8601.1988]. It associates a calendar
date and a wall-clock time with the basetime. If unused the UTC
field MUST be a sequence of 20 NUL bytes, making this ident packet
and thus the BoS page of the skeleton bitstream constant length.
Segment length 8 byte unsigned integer representing the total length
of the segment.
Content byte offset 8 byte unsigned integer representing the offset
of the first non-header page in the segment.
The possible temporal resolution of the presentation and basetime is
on the order of 2^-64. For example, the time formats in use for
media that are described in this document range from 1/24 to 1/60 for
the different [SMPTE] formats. This resolution is enough for any one
of these. It is also expected to accommodate any future needs of
time resolution for any other time format and time-continuously
sampled data.
A denominator of 0 in either presentationtime or basetime indicates
that the respective time is 0 regardless of the value of the
numerator.
2.2. Secondary Header
The skeleton secondary headers are a sequence of packets that each
contain information about one of the other logical bitstreams
contained within the Ogg stream. A skeleton secondary header packet
has the following format:
Whited Expires 15 October 2026 [Page 7]
Internet-Draft skeleton April 2026
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier 'fisbone\0' | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Offset to message header fields | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Serial number | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Number of header packets | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granulerate numerator | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granulerate denominator | 28-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Basegranule | 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 40-43
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Preroll | 44-47
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Granuleshift | Padding/future use | 48-51
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message header fields ... | 52-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2: Secondary header packet layout
Fields that are longer than one byte MUST be encoded with LSB first
byte order.
The fields in a skeleton secondary header packet have the following
meaning:
Identifier An 8 byte field that identifies this packet as a skeleton
secondary header for identifying other logical bitstreams. It
contains the magic numbers:
* 0x66 'f'
* 0x69 'i'
Whited Expires 15 October 2026 [Page 8]
Internet-Draft skeleton April 2026
* 0x73 's'
* 0x62 'b'
* 0x6f 'o'
* 0x6e 'n'
* 0x65 'e'
* 0x00 '\0'
Offset to message header fields 4 byte unsigned integer that
contains the number of bytes used in this packet before the
message header fields. For the version of the skeleton bitstream
described in this document this number is fixed to 44. This field
accommodates future changes to the skeleton bitstream allowing
implementations to parse message header fields even if additional
fields have been added in a future revision of Skeleton.
Serial number 4 byte signed integer containing the bitstream serial
number of the logical bitstream described by this secondary header
packet.
Number of header packets 4 byte unsigned integer that contains the
number of header packets of the referenced logical bitstream
consisting of the BoS page and the secondary header pages that are
included before the Skeleton EoS page.
Granulerate numerator & denominator 8 byte signed integer each.
They represent the temporal resolution of the logical bitstream in
Hz given as a rational number in the same way as the basetime
field.
Basegranule 8 byte signed integer that represents the granule number
with which this logical bitstream starts, which is originally 0,
but will be a positive offset when only a subpart of the stream is
requested.
Preroll 4 byte unsigned integer that contains the number of packets
to pre-roll in order to decode a current packet correctly. This
is for example the case with Ogg Vorbis, which requires a pre-roll
of 2 packets.
Granuleshift A 1 byte unsigned integer describing whether to
partition the granule_position ([RFC3533], Section 6) into two for
that logical bitstream, and how many of the lower bits to use for
the partitioning. The upper bits signify a time-continuous
Whited Expires 15 October 2026 [Page 9]
Internet-Draft skeleton April 2026
granule position for an independently decodable and presentable
data granule. The lower bits are generally used to specify the
relative offset of dependent packets, such as predicted frames of
a video. Hence these can be addressed, though not decoded without
tracing back to the last fully decodable data granule. This is
the case with Ogg Theora; the general procedure is given in
Section 4.2.
Padding/future use 3 bytes of padding data that MUST be set to zero
but may be used in future versions of Skeleton.
Message header fields Header fields following the generic Internet
Message Format defined in [RFC2822]. Each header field consists
of a name followed by a colon (0x3a ":") and the field value.
Field names are case-insensitive. The field value MAY be preceded
by any amount of white space, though a single space (SP, ASCII
value 32) is RECOMMENDED. Multi-line header fields as described
in Section 2.2.3 of [RFC2822] MUST be supported.
2.2.1. Message Header Fields
The following message headers are REQUIRED:
Content-type Mime type of the content encoded in this stream, e.g.
audio/opus, video/theora, etc.
Role Describes the function of this track. Common examples are
"video/main", "audio/main", "text/caption". It is RECOMMENDED to
stick to the existing role names defined in Part Role of
[SkeletonHeaders].
Name A unique free text string which can be used to directly address
or identify the track and which may be shown in user interfaces
that list or allow for selection of tracks.
For more message headers, see [SkeletonHeaders].
As per [RFC2277], message header fields are considered protocol data,
i.e. they are not expected to have human readable text, and they MUST
be entirely encoded in UTF-8. In addition, the mandatory header
fields MUST be encoded in ASCII.
Whited Expires 15 October 2026 [Page 10]
Internet-Draft skeleton April 2026
2.3. Keypoint Index Packets
Before the Skeleton EoS page in the segment header pages come the
keypoint index packets. Index packets are OPTIONAL in a valid
Skeleton bitstream. If index packets are included, there SHOULD be
at least one keypoint packet for each content logical bitstream in
the Ogg stream.
In order to save space, the offsets and timestamps in keypoint
packets are stored as deltas, and then variable byte-encoded. The
offset and timestamp deltas store the difference between the
keypoint's offset and timestamp from the previous keypoint's offset
and timestamp. To calculate the page offset of a keypoint, calculate
the sum of the offset deltas of up to and including the keypoint in
question. The variable byte encoded integers are encoded using 7
bits per byte to store the integer's bits, and the high bit is set in
the last byte used to encode the integer. The bits and bytes are in
LSB first byte order. For example, the integer 7843, or 0001 1110
1010 0011 in binary, would be stored as two bytes: 0xBD 0x23, or 1011
1101 0010 0011 in binary.
Each index packet contains the following fields:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier 'index\0' | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | Serial number | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | Number of keypoints | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | Timestamp denominator | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | First sample time numerator | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 28-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | Last sample end time numerator | 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | Keypoints... | 40-43
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Whited Expires 15 October 2026 [Page 11]
Internet-Draft skeleton April 2026
Figure 3: Keypoint index packet layout
The fields in an index packet have the following meaning:
Identifier An 6 byte field that identifies this page as an index
packet. It contains the magic numbers:
* 0x69 'i'
* 0x6e 'n'
* 0x64 'd'
* 0x65 'e'
* 0x78 'x'
* 0x00 '\0'
Serial number 4 byte signed integer containing the bitstream serial
number of the Ogg logical bitstream described by this index
packet.
Number of keypoints 8 byte unsigned integer containing the number of
keypoints included at the end of this index packet. This value
MAY be zero.
Timestamp denominator 8 byte signed integer representing the
presentation time denominator for this stream. All timestamps,
including keypoint timestamps and first and last sample timestamps
are fractions of seconds over this denominator. The timestamp
denominator MUST NOT be 0.
First sample time numerator 8 byte signed integer representing the
numerator for the presentation time of the first sample in the
track.
Last sample end time numerator 8 byte signed integer representing
the numerator for the presentation end time of the last sample in
the track.
Keypoints 'Number of keypoints' key points, starting with the first
keypoint at offset 42.
2.3.1. Keypoints
Each keypoint comprises the following fields:
Whited Expires 15 October 2026 [Page 12]
Internet-Draft skeleton April 2026
1. The keypoint's page's byte offset delta, as a variable byte
encoded integer. This is the number of bytes that this keypoint
is after the preceeding keypoint's offset, or from the start of
the segment if this is the first keypoint. The keypoint's page
start is therefore the sum of the byte-offset-deltas of all the
keypoints which come before it.
2. The presentation time numerator delta of the first keypoint which
starts on the page at the keypoint's offset, as a variable byte
encoded integer. This is the difference from the previous
keypoint's timestamp numerator. The keypoint's timestamp
numerator is therefore the sum of all the timestamp numerator
deltas up to and including this keypoint's. Divide the timestamp
numerator sum by the timestamp denominator to determine the
presentation time of the keyframe in seconds
Keypoint's MUST be stored in increasing order by offset (and thus by
presentation time).
The byte offsets stored in keypoints are relative to the start of the
Ogg bitstream segment. For a physical Ogg bitstream made up of two
chained Oggs, the offsets in the second Ogg segment's bitstream's
index are relative to the beginning of the second Ogg in the chain,
not the first. If a physical Ogg bitstream is made up of chained
Oggs, the presence of an index in one segment does not imply that
there will be an index in any other segment.
The first-sample-time and last-sample-time are rational numbers, in
seconds. If the denominator is 0 for the first-sample-time or the
last-sample-time, then that value was unable to be determined at
indexing time, and is unknown.
The exact number of keypoints used to construct the index is up to
the indexer, but it is RECOMMENDED to include at most one keypoint
per every 64KB of data, or every 1000ms, whichever is least frequent.
For codecs which use key frames, indexers SHOULD create one keypoint
pointing to each key frame.
3. Ogg Media Mapping
Because Ogg may be used to encapsulate any any type of time-
continuous data stream it does not place requirements on the order of
data streams in the file. An Ogg physical bitstream with a Skeleton
track, however, MUST have the following page order:
* Skeleton Beginning-of-Stream (BoS) page.
* BoS pages of the other logical bitstreams.
Whited Expires 15 October 2026 [Page 13]
Internet-Draft skeleton April 2026
* Secondary header pages of all logical bitstreams (including
Skeleton secondary headers).
* Skeleton keypoint index packets (if present)
* Skeleton End-of-Stream (EoS) page.
* Data and EoS pages of logical bitstreams, excluding skeleton,
multiplexed in a time-synchronous fashion.
In addition, Skeleton has the following further restrictions:
* An Ogg segment MUST NOT contain more than one Skeleton logical
bitstream.
* The skeleton BoS page MUST only contain the Skeleton Ident
(fishead) header as a single packet.
* The secondary header pages of a Skeleton logical bitstream consist
only of fisbone header packets.
* The Skeleton stream MUST NOT contain any content pages or data
packets other than keypoint index packets.
* The skeleton EoS page MUST contain a single packet of length zero.
* The skeleton EoS page MUST appear before any data pages for any
other logical bitstream in the Ogg bitstream.
* The skeleton EoS page MUST end the skeleton logical bitstream. If
an Ogg stream parser reaches the skeleton EoS page, it knows that
it has received all the BoS and secondary header pages and can
start setting up its decoding or parsing environment.
4. Time Handling
With time-continuous data inside Ogg, one needs to handle data at
four different levels:
* at the byte level: upon seeking,
* at the packet level: upon encapsulating,
* at the granule level: upon recomposing, and
* at the time level: upon displaying and addressing.
This section explains how they all fit together.
Whited Expires 15 October 2026 [Page 14]
Internet-Draft skeleton April 2026
4.1. Conceptual Overview
Ogg bitstreams represent a single timeline with multiple content
tracks. All of these tracks relate to the same timeline which starts
at a certain time point and ends when the last bitstream ends.
An example bitstream can be seen in the following figure. It
consists of an Ogg bitstream that contains 4 media bitstreams. The
picture is a conceptual representation of the time intervals covered
by the different logical bitstreams and the Ogg pages used to
encapsulate the data. In the flat representation these are
multiplexed such that the data packets of each of these bitstreams
occur at the correct time.
t_url
|
t_0 v t_n
|------------------------------------------------------------------->|
----------------------------------------------
| | | | | | | | | | |//| | | | |
----------------------------------------------
audio bitstream 1
-------------------------------------------------------------
| | | |/////| | | | | | |
-------------------------------------------------------------
video bitstream 1
----------------------------------------------------
| | | | |//| | | | | | | | | | | | |
----------------------------------------------------
audio bitstream 2
-------------------------------
| |/////| | | |
-------------------------------
video bitstream 2
Figure 4: Ogg bitstream layout example
The time point at which an Ogg bitstream starts (t_0 in Figure 4) is
called the "basetime" and represents the time in seconds associated
with the granule position of 0 on all logical bitstreams. Typically,
a newly created Ogg file starts all its logical bitstreams at granule
position 0, and a typical extract of an Ogg bitstream, such as the
one starting at t_url in Figure 4, starts each of its logical
bitstreams at different granule positions. These granule positions
are stored in the "basegranule" field of the skeleton secondary
header packets.
Whited Expires 15 October 2026 [Page 15]
Internet-Draft skeleton April 2026
The "basetime" of an Ogg bitstream may be 0, but it can also be any
positive time. For example, in professional video production, the
first frame of video of a program normally refers to a SMPTE basetime
[SMPTE] of 01:00:00:00, not 00:00:00:00. Associating such a practice
to a digital video resource requires a way to store that basetime
with the resource and interpreting it correctly when addressing
offsets such as t_url. Skeleton provides such a mapping through the
basetime field in the skeleton ident header.
Also associated with the basetime is an [ISO.8601.1988] calendar date
and wall-clock time (a "UTC base") which represent a real-world time
giving some meaningful calendar date association to the content such
as the creation time or the first presentation time. The UTC base is
specified in the "UTC" field of the Skeleton ident header.
4.2. Mapping a granule position to a time position
Each one of the encapsulated data bitstreams have their own temporal
resolution at which they provide data to cover the given timeline.
This temporal resolution is usually given through the sampling rate
of the particular bitstream. For example, a raw audio bitstream at
CD quality is sampled with a sampling rate of 44100 Hz. A video
bitstream may be sampled with a frame rate of 25 frames per second.
This temporal resolution is called the "granulerate". A granule is a
data element that is based on a regular data rate specific to the
content type, such as the frame rate for video or the sampling rate
for audio. It even exists for bitstreams that are not sampled at a
regular rate—then it is the highest resolution of any of the used
sampling rates. The granulerate is specified in the skeleton
secondary header packets for each logical bitstream.
Each one of the bitstreams insert data into the Ogg bitstream through
packets which have an associated temporal duration based on the
encoder packaging. Packets are packaged into Ogg pages, which have a
granule position associated with them ([RFC3533], Section 6). Not
taking the special case of a granuleshift into account, the granule
position specifies the number of granules that have been encapsulated
since the implicit start of the original bitstream until and
including the given Ogg page.
The granule position together with the granulerate and granuleshift
information of the skeleton secondary header packets for the
particular logical bitstream are used for the calculation of the time
position for which a data packet of the logical bitstream completes
data. A granule position of -1 indicates a special case and MUST NOT
be used for calculation of a mapping to time.
Whited Expires 15 October 2026 [Page 16]
Internet-Draft skeleton April 2026
In principle, the granule position of an Ogg page divided by the
granulerate of this page's logical bitstream provides the time
position that is reached in that bitstream after decoding all data
packets finished on this page. However, the granule position field
in an Ogg page allows for a more finely-grained description of the
temporal position. The following image explains the composition of
the granule position field in an Ogg page:
granule_position
------------------------------------------------
| keyindex | keyoffset |
------------------------------------------------
Figure 5: Granule position field layout
The granuleshift field of the skeleton secondary header packets
describes how many of the granule_position's 64 bits are being used
for the keyoffset. The keyoffset part of the granule_position is
commonly used when the logical bitstream consists of packets that can
only be fully decoded when referring back to a previous packet. For
example, video streams often consist of inter and intra coded frames,
where the intra frames are fully decodable and the inter frames are
intermediate frames that require backtracking to the last inter frame
for accurate decoding. Another example is a logical bitstream that
is mapped as instantaneous information (i.e. their granuleposition
represents the start time and the end time of the packet data), but
actually has a duration associated to it, which is provided through a
subsequent packet. The keyindex part of the granule_position is then
used to provide the temporal position of the reference packet and the
keyoffset part provides a counter for the data in between.
The calculation of the temporal position of an Ogg page using
Skeleton is thus specified through the following formula:
t_page = basetime + ((keyindex + keyoffset) / granulerate)
Figure 6: Ogg page to temporal position formula
The basetime provides the time offset used at the beginning of the
logical bitstream for the first data packet and thus MUST be added
for a correct calculation of the temporal position.
As an example regard an audio bitstream that has a granulerate of
44100 (i.e. 44100 samples per 1 sec), a granuleshift of 0, and starts
at 4 sec. When reaching a granule_position of 88200, this maps to a
time position of 6 seconds:
t_page = 4 + ((88200 + 0) / 44100) = 6
Whited Expires 15 October 2026 [Page 17]
Internet-Draft skeleton April 2026
Figure 7
This signifies that the bitstream has reached the second sec of the
audio bitstream after the end of decoding this page's packets, but
maps to 6 seconds because of the basetime.
As another example consider a video bitstream that has a granulerate
of 25 (i.e. 25 frames per 1 second), a granuleshift of 3 (because it
encodes - say - 7 partial frames between each fully encoded frame),
and starts at 0 sec. When reaching a granule_position of 997, i.e.
a keyindex of 62 and a keyshift of 5, this maps to a fully decodable
time position of 2.68 seconds:
t_page = 0 + ((62 + 5) / 25) = 2.68 sec
Figure 8
The granulerate of a time-instantaneous bitstream can be chosen
arbitrarily by the bitstream multiplexer. Per default, a granulerate
of 1000 is used, which is the resolution of npt. The resolution of
all the time schemes is given as:
npt 1000 (milliseconds)
smpte-24 24 (24 fps)
smpte-24-drop 24/1.001 = 23.976 (approx. as per SMPTE)
smpte-25 25
smpte-30 30
smpte-30-drop 30/1.001 = 29.970 (approx. as per SMPTE)
smpte-50 50
smpte-60 60
smpte-60-drop 60/1.001 = 59.940 (approx. as per SMPTE)
The granule position of the page finishing data of a time-
instantaneous bitstream packet MUST signify the start time of that
packet. For example, a CMML bitstream with a granulerate of 1000, a
basetime of 0, and a clip that lasts from npt=12.020 till npt=15.0
will get a granule_position of 12020. In contrast, the
granule_position of the page finishing data of e.g. an audio
bitstream with granulerate 44100, basetime 0 and containing data from
npt=12.020 to npt=15.0 will be 661500.
Whited Expires 15 October 2026 [Page 18]
Internet-Draft skeleton April 2026
A note about field overflows: an overflow of the granule position
field can destroy the temporal integrity of the Ogg physical
bitstream. In this case, a multiplexer MUST end the Ogg physical
bitstream and restart a new one resetting the counter to 0 and
adjusting the basetime appropriately. This is also called sequential
multiplexing in Ogg. The same measure MUST be taken in case of an
overflow of the page_sequence_number on one of the logical
bitstreams.
4.3. Seeking into the bitstream
Seeking to a time offset inside an Ogg logical bitstream is a
fundamental activity frequently performed on media data. Time inside
an Ogg with a Skeleton track is specified as a temporal offset from
the "beginning" of the stream, making use of the basetime field.
Time offsets can also be specified as calendar dates and times. The
UTC base is then used as a basis for offsetting.
The basetime allows to correctly map a temporal offset point such as
a temporal URI to a byte position in the stream. In the above figure
take t_uri=npt:14.0 as the temporal offset addressed on a stream with
t_0=npt:5.0 as the basetime—this requires a stream offsetting of only
9 sec to the appropriate granule position in each of the bitstreams,
in the figure marked through patterned pages.
The seeking action is performed on the interleaved bitstream, in
which the data packets occur in a temporally consecutive order based
on the time at which their data ends. These times are represented in
the granule positions of the Ogg pages, which are only allowed to
monotonically increase within one logical bitstream. This implies
that when having found an Ogg page with a granule position that maps
to a given seek time (i.e. covers the time or ends at it), the seek
has found the right location. This applies over all logical
bitstreams. In the above example, this means that the byte position
of the first occurring page of the patterned pages has been found.
There is a complication to the seeking: some logical bitstreams have
backwards dependencies in their data packets and these have to be
taken into account for seeking. For example, a logical bitstream may
require several of its previous packets to allow a correct and
complete decoding of the actual packet that occurs at the seektime.
This is the case for Theora which requires to go back to the previous
keyframe when decoding from a time offset. It is also the case for
Vorbis which requires the previous 2 packets for accurate setup of
the frequency transform - Speex needs approximately 2 packets for
similar reasons. Even instantaneous bitstreams may require to go
back to a previous packet to recover the last state information.
Whited Expires 15 October 2026 [Page 19]
Internet-Draft skeleton April 2026
Therefore, once seeking has located the correct byte position that
refers to the given temporal offset, it MUST seek back. For logical
bitstreams that have a non-zero "granuleshift" in the skeleton, it
MUST seek back to the Ogg page that has a "keyindex" granule
position. For logical bitstreams that have a non-zero "preroll" in
the skeleton, it MUST seek back that many packets. The earliest byte
position that satisfies all these requirements is the correct seek
position.
A player that presents from an offset MUST take into account that the
bitstream may contain some packets that are only there to allow
accurate decoding of the seek time. When the backwards dependencies
were resolved for a specific logical bitstream, several non-relevant
Ogg pages of may also have ended up in the intermediate. These have
to be skipped by a player. The time that a player MUST start
presenting from is given in the "presentationtime" in the skeleton
ident header.
4.4. Remultiplexing an Ogg Bitstream
Ogg with a Skeleton track allows for the creation of mashups of a
file without actual decoding and re-encoding. A mashup in the sense
used here is when a subpart of a Ogg physical bitstream is required,
such as a temporal sub-interval from the whole file. Skeleton allows
the creation of the mashup bitstream through recomposition and
remultiplexing. There are several aims for performing the
remultiplexing with as little effort and therefore as little delay as
possible:
* no decoding of the logical bitstreams is performed.
* no changes to the pages, in particular to the granule positions
are made.
* changes occur only to the control section.
The fields of the skeleton track allow achievement of all these aims.
Remultiplexing is essentially achieved by seeking to the position as
described above and then including from each logical bitstream only
the relevant Ogg pages into the new stream. Changes to fields in the
bitstream are restricted to the control section:
* the "presentationtime" MUST be adjusted to the requested start
time
Whited Expires 15 October 2026 [Page 20]
Internet-Draft skeleton April 2026
* the "startgranule" for each logical bitstream MUST be adjusted to
the granule position at which each logical bitstream starts. This
is not the first granule position of the Ogg pages included into
the bitstream, but rather the last one that did not get included,
as it represents the start time of the bitstream.
Everything else, and in particular the Ogg pages, stay the same.
This is important to allow caching of such files as is required for
Web proxies.
5. IANA Considerations
// TODO: define an Ogg Skeleton Header Registry and register the
// stuff from the [SkeletonHeaders] wiki page? This memo includes no
request to IANA.
6. Security Considerations
Ogg format bitstreams contain several multiplexed binary and non-
binary data bitstream. There is no generic encryption or signing
mechanism provided for the complete bitstream or anyone of its parts.
As the format of the encapsulated media bitstreams is not prescribed
and is identified through the "Content-type" Message header field in
that bitstream's skeleton secondary header packet, it is possible to
encrypt or sign that media bitstream and then mark it accordingly
with a MIME type that signifies the encryption. It is up to the
applications that use this bitstream to provide an appropriate codec
to handle such bitstreams.
As Ogg format bitstreams generally contain arbitrary bitstreams, it
is possible to include executable content in them. This can be an
issue with applications that decode these bitstreams, especially when
they are used in a network scenario. Such applications MUST ensure
correct handling of manipulated bitstreams, of buffer overflow and
the like.
7. Normative References
[RFC2822] Resnick, P., Ed., "Internet Message Format", RFC 2822,
DOI 10.17487/RFC2822, April 2001,
<https://www.rfc-editor.org/info/rfc2822>.
[RFC3533] Pfeiffer, S., "The Ogg Encapsulation Format Version 0",
RFC 3533, DOI 10.17487/RFC3533, May 2003,
<https://www.rfc-editor.org/info/rfc3533>.
Whited Expires 15 October 2026 [Page 21]
Internet-Draft skeleton April 2026
[ISO.8601.1988]
International Organization for Standardization, "Data
elements and interchange formats - Information interchange
- Representation of dates and times", ISO Standard 8601,
June 1988.
8. Informative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
Languages", BCP 18, RFC 2277, DOI 10.17487/RFC2277,
January 1998, <https://www.rfc-editor.org/info/rfc2277>.
[RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the
Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716,
September 2012, <https://www.rfc-editor.org/info/rfc6716>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC9639] van Beurden, M.Q.C. and A. Weaver, "Free Lossless Audio
Codec (FLAC)", RFC 9639, DOI 10.17487/RFC9639, December
2024, <https://www.rfc-editor.org/info/rfc9639>.
[SkeletonHeaders]
Xiph.Org, "Skeleton Headers", March 2026,
<https://wiki.xiph.org/SkeletonHeaders>.
[SMPTE] SMPTE, "SMPTE STANDARD for Television, Audio and Film -
Time and Control Code", ANSI 12M-1999, September 1999.
[Theora] Xiph.Org, "Theora Specification", 3 June 2017,
<https://xiph.org/theora/doc/Theora.pdf>.
Acknowledgements
Thanks to Silvia Pfeiffer, and Conrad D. Parker for their work on an
earlier attempt to specify the Skeleton bitstream which was consulted
(and stolen from) heavily while writing this document. Thanks also
to Christopher Montgomery and Andre Pang who are thanked in the
earlier draft mentioned previously and (probably) deserve to be
thanked again. Also to the contributors to the Ogg Skeleton
description on the Xiph.Org wiki.
Whited Expires 15 October 2026 [Page 22]
Internet-Draft skeleton April 2026
Author's Address
Sam Whited (editor)
Email: sam@samwhited.com
URI: https://blog.samwhited.com
Whited Expires 15 October 2026 [Page 23]