codec M. Graczyk
Internet-Draft J. Skoglund
Intended status: Standards Track Google Inc.
Expires: May 25, 2017 November 21, 2016
Ambisonics in an Ogg Opus Container
draft-ietf-codec-ambisonics-01
Abstract
This document defines an extension to the Ogg format to encapsulate
ambisonics coded using the Opus audio codec.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 25, 2017.
Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Graczyk & Skoglund Expires May 25, 2017 [Page 1]
Internet-Draft Opus Ambisonics November 2016
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Ambisonics With Ogg Opus . . . . . . . . . . . . . . . . . . 3
3.1. Channel Mapping Family 2 . . . . . . . . . . . . . . . . 3
3.2. Channel Mapping Family 3 . . . . . . . . . . . . . . . . 4
4. Downmixing . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.1. Channel Mapping Family 2 . . . . . . . . . . . . . . . . 5
4.2. Channel Mapping Family 3 . . . . . . . . . . . . . . . . 6
5. Security Considerations . . . . . . . . . . . . . . . . . . . 8
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 9
8.1. Normative References . . . . . . . . . . . . . . . . . . 9
8.2. Informative References . . . . . . . . . . . . . . . . . 9
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10
1. Introduction
Ambisonics is a representation format for three dimensional sound
fields which can be used for surround sound and immersive virtual
reality playback. See [gerzon75] and [daniel04] for technical
details on the ambisonics format. For the purposes of the this
document, ambisonics can be considered a multichannel audio stream.
A separate stereo stream can be used alongside the ambisonics in a
head-tracked virtual reality experience to provide so-called non-
diegetic audio - audio which should remain unchanged by listener head
rotation; e.g., narration or stereo music. Ogg is a general purpose
container, supporting audio, video, and other media. It can be used
to encapsulate audio streams coded using the Opus codec. See
[RFC6716] and [RFC7845] for technical details on the Opus codec and
its encapsulation in the Ogg container respectively.
This document extends the Ogg format by defining two new channel
mapping families for encoding ambisonics. The Ogg Opus format is
extended indirectly by adding an item with value 2 or 3 to the IANA
"Opus Channel Mapping Families" registry. When 2 or 3 are used as
the Channel Mapping Family Number in an Ogg stream, the semantic
meaning of the channels in the multichannel Opus stream is one of the
ambisonics layouts defined in this document. This mapping can also
be used in other contexts which make use of the channel mappings
defined by the Opus Channel Mapping Families registry.
Graczyk & Skoglund Expires May 25, 2017 [Page 2]
Internet-Draft Opus Ambisonics November 2016
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
[RFC2119].
3. Ambisonics With Ogg Opus
Ambisonics MAY be encapsulated in the Ogg format by encoding with the
Opus codec and setting the Channel Mapping Family value to 2 or 3 in
the Ogg Identification Header. A demuxer implementation encountering
Channel Mapping Family 2 or Family 3 MUST interpret the Opus stream
as containing ambisonics with the format described in Section 3.1 or
Section 3.2, respectively.
3.1. Channel Mapping Family 2
Allowed numbers of channels: (1 + n)^2 + 2j for n = 0...14 and j = 0
or 1, where n denotes the ambisonic order and j whether or not there
is a separate non-diegetic stereo stream. This corresponds to
periphonic ambisonics from zeroth to fourteenth order plus
potentially two channels of non-diegetic stereo. Explicitly the
allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36,
38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171,
196, 198, 225, 227.
This channel mapping uses the same channel mapping table format used
by channel mapping families 1 and 255. The output channels are
ambisonic components ordered in Ambisonic Channel Number (ACN) order,
defined in Figure 1, followed by two optional channels of non-
diegetic stereo indexed (left, right).
ACN = n * (n + 1) + m,
for order n and degree m.
Figure 1: Ambisonic Channel Number (ACN)
For the ambisonic channels the ACN component corresponds to channel
index as k = ACN + 1. The reverse correspondence can also be
computed for an ambisonic channel with index k.
order n = floor(sqrt(k)) - 1,
degree m = k - n * (n + 1) - 1.
Figure 2: Ambisonic Degree and Order from ACN
Graczyk & Skoglund Expires May 25, 2017 [Page 3]
Internet-Draft Opus Ambisonics November 2016
Ambisonic channels are normalized with Schmidt Semi-Normalization
(SN3D). The interpretation of the ambisonics signal as well as
detailed definitions of ACN channel ordering and SN3D normalization
are described in [ambix] Section 2.1.
3.2. Channel Mapping Family 3
In this mapping, C output channels are generated at the decoder by
multiplying N decoded streams with a designated demixing matrix, D,
having C rows and N columns. The number of output channels does not
need to correspond to a full ambisonic order representation. This
mapping allows for encoding and decoding of full order ambisonics,
mixed order ambisonics, and for non-diegetic stereo channels. Let X
denote a column vector containing N decoded streams X1, X2, ..., XN,
and let S denote a column vector containing C output streams S1, S2,
..., SC. Then S = D X, i.e.,
/ \ / \ / \
| S1 | | D11 D12 ... D1N | | X1 |
| S2 | | D21 D22 ... D2N | | X2 |
| ... | = | ... ... ... ... | | ... |
| SC | | DC1 DC2 ... DCN | | XN |
\ / \ / \ /
Figure 3: Demixing in Channel Mapping Family 3
The matrix MUST be provided as side information and MUST be stored in
the channel mapping table part of the identification header, c.f.
section 5.1.1 in [RFC7845]. For channel mapping family 3 the mapping
table has the following layout:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+
| Stream Count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Coupled Count | Channel Numbering :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Demixing Matrix :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: Channel Mapping Table for Channel Mapping Family 3
The fields in the channel mapping table have the following meaning:
1. Stream Count 'N' (8 bits, unsigned):
Graczyk & Skoglund Expires May 25, 2017 [Page 4]
Internet-Draft Opus Ambisonics November 2016
This is the total number of streams encoded in each Ogg packet.
2. Coupled Stream Count 'M' (8 bits, unsigned):
This is the number of the N streams whose decoders are to be
configured to produce two channels (stereo).
3. Output Channel Numbering (8*C unsigned bits):
This contains one octet per output channel, indicating which
ambisonic channel the output channel corresponds to. Let 'index'
be the value of this octet for a particular output channel. If
'index' is less than 254, it equals the ACN number of the
corresponding channel. If 'index' is 254, it means the
corresponding channel contains the left channel of a non-diegetic
stereo stream. If 'index' is 255, it means the corresponding
channel contains the right channel of a non-diegetic stereo
stream.
4. Demixing Matrix (32*N*C bits):
The coefficients of the demixing matrix stored column-wise as
little endian floats.
Note that [RFC7845] specifies that the identification header cannot
exceed one "page", which is 65,025 octets. This sets a practical
maximum ambisonic order of 10, if full order is utilized and the
number of coded streams is the same as the ambisonic order plus the
two non-diegetic channels. Also note that the total output channel
number, C, MUST be set in the 3rd field of the identification header.
4. Downmixing
4.1. Channel Mapping Family 2
An Ogg Opus player MAY use the matrix in Figure 5 to implement
downmixing from multichannel files using Channel Mapping Family 2
Section 3.1, when there is no non-diegetic stereo. This downmixing
is known to give acceptable results for stereo downmixing from
ambisonics. The first and second ambisonic channels are known as "W"
and "Y" respectively.
Graczyk & Skoglund Expires May 25, 2017 [Page 5]
Internet-Draft Opus Ambisonics November 2016
/ \ / \ / \
| L | | 0.5 0.5 0.0 ... | | W |
| R | = | 0.5 -0.5 0.0 ... | | Y |
\ / \ / | ... |
\ /
Figure 5: Stereo Downmixing Matrix for Channel Mapping Family 2 -
only Ambisonic Channels
The first ambisonic channel (W) is a mono audio stream which
represents the average audio signal over all directions. Since W is
not directional, Ogg Opus players MAY use W directly for mono
playback.
If a non-diegetic stereo track is present, the player MAY use the
matrix in Figure 6 for downmixing. Ls and Rs denote the two non-
diegetic stereo channels.
/ \ / \ / \
| L | | 0.25 0.25 0.0 ... 0.5 0.0 | | W |
| R | = | 0.25 -0.25 0.0 ... 0.0 0.5 | | Y |
\ / \ / | ... |
| Ls |
| Rs |
\ /
Figure 6: Stereo Downmixing Matrix for Channel Mapping Family 2 -
Ambisonic Channels Plus a Non-diegetic Stereo Stream
4.2. Channel Mapping Family 3
In Channel Mapping Family 3 described in Section 3.2, additional
side-information stored in the identification header is needed to
transform the coded streams into ambisonic and non-diegetic stereo.
It would therefore be reasonable to also utilize this information for
stereo downmix. Assume in the following that the output channels
contain the 0th and 1st order channels, ACN 0 and 1, also known as
"W" and Y" respectively. If we also assume that the output channel
numbering in Figure 4 is structured so that the first output channel
contains W and the second contains Y, an Ogg Opus player MAY use the
matrix in Figure 7 to implement downmixing when there is no non-
diegetic stereo. If the output channels are ordered differently the
columns of the downmixing matrix should be rearranged accordingly so
that only W and Y contribute to the downmix.
Graczyk & Skoglund Expires May 25, 2017 [Page 6]
Internet-Draft Opus Ambisonics November 2016
/ \ / \ / \ / \ / \
| L | | 0.5 0.5 0.0 ... | | W | | 0.5 0.5 0.0 ... | | S1 |
| R | = | 0.5 -0.5 0.0 ... | | Y | = | 0.5 -0.5 0.0 ... | | S2 | =
\ / \ / | ... | \ / | ... |
\ / \ /
/ \ / \ / \
| 0.5 0.5 0.0 ... | | D11 D12 ... D1N | | X1 |
= | 0.5 -0.5 0.0 ... | | D21 D22 ... D2N | | X2 |
\ / | ... ... ... ... | | ... |
| DC1 DC2 ... DCN | | XN |
\ / \ /
Figure 7: Stereo Downmixing Matrix for Channel Mapping Family 3 -
only Ambisonic Channels
Similarly, if a non-diegetic stereo track is present, the player MAY
use the matrix in Figure 8 for downmixing. Ls and Rs denote the two
non-diegetic stereo channels and it is assumed here that they are
located as the two last channels of the output channels. If the
output channels are ordered differently the columns of the downmixing
matrix should be rearranged accordingly so that only W, Y, Ls, and Rs
contribute to the downmix.
Graczyk & Skoglund Expires May 25, 2017 [Page 7]
Internet-Draft Opus Ambisonics November 2016
/ \ / \ / \
| L | | 0.25 0.25 0.0 ... 0.5 0.0 | | W |
| R | = | 0.25 -0.25 0.0 ... 0.0 0.5 | | Y | =
\ / \ / | ... |
| Ls |
| Rs |
\ /
/ \ / \
| 0.25 0.25 0.0 ... 0.5 0.0 | | S1 |
= | 0.25 -0.25 0.0 ... 0.0 0.5 | | S2 | =
\ / | ... |
| SC-1 |
| SC |
\ /
/ \ / \ / \
| 0.25 0.25 0.0 ... 0.5 0.0 | | D11 D12 ... D1N | | X1 |
= | 0.25 -0.25 0.0 ... 0.0 0.5 | | D21 D22 ... D2N | | X2 |
\ / | ... ... ... ... | | ... |
| DC1 DC2 ... DCN | | XN |
\ / \ /
Figure 8: Stereo Downmixing Matrix for Channel Mapping Family 3 -
Ambisonic Channels Plus a Non-diegetic Stereo Stream
5. Security Considerations
Implementations of the Ogg container need take appropriate security
considerations into account, as outlined in Section 10 of [RFC7845].
The extension defined in this document requires that semantic meaning
be assigned to more channels than the existing Ogg format requires.
Since more allocations will be required to encode and decode these
semantically meaningful channels, care should be taken in any new
allocation paths. Implementations MUST NOT overrun their allocated
memory nor read from uninitialized memory when managing the ambisonic
channel mapping.
6. IANA Considerations
This document updates the IANA Media Types registry "Opus Channel
Mapping Families" to add two new assignments.
Graczyk & Skoglund Expires May 25, 2017 [Page 8]
Internet-Draft Opus Ambisonics November 2016
+-------+---------------------------+
| Value | Reference |
+-------+---------------------------+
| 2 | This Document Section 3.1 |
| | |
| 3 | This Document Section 3.2 |
+-------+---------------------------+
7. Acknowledgments
Thanks to Timothy Terriberry, Marcin Gorzel and Andrew Allen for
their guidance and valuable contributions to this document.
8. References
8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the
Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716,
September 2012, <http://www.rfc-editor.org/info/rfc6716>.
[RFC7845] Terriberry, T., Lee, R., and R. Giles, "Ogg Encapsulation
for the Opus Audio Codec", RFC 7845, DOI 10.17487/RFC7845,
April 2016, <http://www.rfc-editor.org/info/rfc7845>.
[ambix] Nachbar, C., Zotter, F., Deleflie, E., and A. Sontacchi,
"AMBIX - A SUGGESTED AMBISONICS FORMAT", June 2011,
<http://iem.kug.ac.at/fileadmin/media/iem/projects/2011/
ambisonics11_nachbar_zotter_sontacchi_deleflie.pdf>.
8.2. Informative References
[gerzon75]
Gerzon, M., "Ambisonics. Part one: General system
description", August 1975,
<http://www.michaelgerzonphotos.org.uk/articles/
Ambisonics%201.pdf>.
[daniel04]
Daniel, J. and S. Moreau, "Further Study of Sound Field
Coding with Higher Order Ambisonics", May 2004,
<http://pcfarina.eng.unipr.it/Public/phd-thesis/
aes116%20high-passed%20hoa.pdf>.
Graczyk & Skoglund Expires May 25, 2017 [Page 9]
Internet-Draft Opus Ambisonics November 2016
Authors' Addresses
Michael Graczyk
Google Inc.
1600 Amphitheatre Parkway
Mountain View, CA 94043
USA
Email: mgraczyk@google.com
Jan Skoglund
Google Inc.
1600 Amphitheatre Parkway
Mountain View, CA 94043
USA
Email: jks@google.com
Graczyk & Skoglund Expires May 25, 2017 [Page 10]