Skip to main content

Deep Audio Redundancy (DRED) Extension for the Opus Codec
draft-valin-opus-dred-01

The information below is for an old version of the document.
Document Type
This is an older version of an Internet-Draft whose latest revision state is "Active".
Authors Jean-Marc Valin , Jan Buethe
Last updated 2023-07-07 (Latest revision 2023-03-08)
RFC stream (None)
Formats
Stream Stream state (No stream defined)
Consensus boilerplate Unknown
RFC Editor Note (None)
IESG IESG state I-D Exists
Telechat date (None)
Responsible AD (None)
Send notices to (None)
draft-valin-opus-dred-01
Internet Engineering Task Force                                JM. Valin
Internet-Draft                                                 J. Buethe
Updates: 6716 (if approved)                                       Amazon
Intended status: Standards Track                             7 July 2023
Expires: 8 January 2024

       Deep Audio Redundancy (DRED) Extension for the Opus Codec
                        draft-valin-opus-dred-01

Abstract

   This document proposes a mechanism for embedding very low bitrate
   deep audio redundancy (DRED) within the Opus codec (RFC6716)
   bitstream.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 8 January 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Valin & Buethe           Expires 8 January 2024                 [Page 1]
Internet-Draft                  Opus DRED                      July 2023

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   2
   2.  DRED Extension Format . . . . . . . . . . . . . . . . . . . .   2
   3.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   3
     3.1.  Opus Media Type Update  . . . . . . . . . . . . . . . . .   3
     3.2.  Mapping to SDP Parameters . . . . . . . . . . . . . . . .   4
   4.  Security Considerations . . . . . . . . . . . . . . . . . . .   4
   5.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   4
     5.1.  Normative References  . . . . . . . . . . . . . . . . . .   4
     5.2.  Informative References  . . . . . . . . . . . . . . . . .   5
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   5

1.  Introduction

   This document proposes a mechanism for embedding very low bitrate
   deep audio redundancy (DRED) within the Opus codec [RFC6716]
   bitstream.

1.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.  DRED Extension Format

   We use the Opus extension mechanism [opus-extension] to add deep
   redundancy within the padding of an Opus packet.  We use the
   extension ID 32, which means that the L flag signals whether a length
   code is included.  In this document, we define only the extension
   payload.  [Note: until adoption by the IETF, experimental
   implementations of DRED MUST use experiment extension ID 127 to avoid
   causing interoperability problems]

   The principles behind the DRED mechanism defined in this extension
   are explained in [dred-paper].  All the data in the extension payload
   is encoded using the Opus entropy coder defined in Section 4.1 of
   [RFC6716].  Since some of the fields at the beginning of the payload
   are encoded with flat binary probabilities, they can still be
   interpreted as bits.

Valin & Buethe           Expires 8 January 2024                 [Page 2]
Internet-Draft                  Opus DRED                      July 2023

   The extension starts with an offset indicator, encoded as a signed
   5-bit integer (two's complement) in units of 2.5 ms.  The offset
   indicates the time of the last sample analysed for the transmitted
   features in the packet, measured from the time of the first sample in
   the Opus frame that contains the extension data.

   The offset is followed by a 4-bit initial quantizer field (Q0)
   ranging from 0 to 15.  That quantizer is used on the most recent
   frame encoded and is followed by the 3-bit quantizer slope dQ.  The
   3-bit dQ index selects from the following values: [0, 1/8, 3/16, 1/4,
   3/8, 1/2, 3/4, 1] quantizer step per frame.  The quantizer for frame
   k is thus given by: min(15, round(Q0 + dQ_table[dQ] * k)).  For
   example, using Q0=5 and dQ=2 (3/16), frame k=20 would use a quantizer
   of round(5 + 3/16 * k) = 9.

   The compressed redundancy information consists of an initial state
   coded with a pyramid vector quantizer (PVQ), followed by the entropy-
   coded latent representation.  The number of 40-ms DRED blocks is not
   coded explicitly.  Instead, the decoder MUST NOT decode blocks when
   fewer than 8 bits remain in the DRED payload.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Offset |  Q0   |  dQ |    PVQ                                |
      +-+-+-+-+-+-+-+-+-+-+-+-+                                       +
      :                                                               :
      |            ...                +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                               |  Latent coeffs                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      :                                                               :
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                        Figure 1: Extension framing

3.  IANA Considerations

   [Note: Until the IANA performs the actions described below,
   implementers should use 127 instead of 32 as the extension number.]

   This document assigns ID 32 to the "Opus Extension IDs" registry
   created in [opus-extension] to implement the proposed DRED extension.

3.1.  Opus Media Type Update

   This document updates the audio/opus media type registration
   [RFC7587] to add the following two optional parameters:

Valin & Buethe           Expires 8 January 2024                 [Page 3]
Internet-Draft                  Opus DRED                      July 2023

   ext32-dred-duration: Specifies the maximum amount of DRED information
   (in milliseconds) that the receiver can use.  The receiver MUST be
   able to handle any valid DRED duration even if it does not make use
   of it.

   sprop-ext32-dred-duration: Maximum amount of DRED information (in
   milliseconds) that the sender is likely to use.  The received MUST be
   able to handle any valid DRED duration even if it does not make use
   of it.

3.2.  Mapping to SDP Parameters

   The media type parameters described above map to declarative SDP and
   SDP offer-answer in the same way as other optional parameters in
   [RFC7587].  Regardless of any a=fmtp SDP attribute specified, the
   receiver MUST be capable of receiving any signal.

4.  Security Considerations

   As is the case for any media codec, the decoder must be robust
   against malicious payloads.  Similarly, the encoder must also be
   robust to malicious audio input since the encoder input can often be
   controlled by an attacker.  That can happen through browser JS, echo,
   or when the encoder is on a gateway.

   DRED is designed to have a complexity that is independent of the
   signal characteristics.  However, there exist implementation details
   that can cause signal-dependent complexity changes.  One example is
   CPU treatement of denormals that can sometimes cause increased CPU
   load and could be triggered by malicious input.  For that reason, it
   is important to minimize such impact to reduce the impact of DOS
   attacks.  Similarly, since the encoding and decoding process can be
   computationally costly, devices must manage the complexity to avoid
   attacks that could trigger too much DRED encoding or decoding to be
   performed.

   The use of variable-bitrate (VBR) encoding in DRED poses a
   theoretical information leak threat [RFC6562], but that threat is
   believed to be significantly lower than that posed by VBR encoding in
   the main Opus payload.  Since this document provides a way to
   dymanically vary the amount of redundancy transmitted, it is also
   possible to reduce the overall VBR risk of Opus by using DRED as a
   way of making the total Opus payload constant (CBR) or nearly
   constant.

5.  References

5.1.  Normative References

Valin & Buethe           Expires 8 January 2024                 [Page 4]
Internet-Draft                  Opus DRED                      July 2023

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC7587]  Spittka, J., Vos, K., and JM. Valin, "RTP Payload Format
              for the Opus Speech and Audio Codec", RFC 7587,
              DOI 10.17487/RFC7587, June 2015,
              <https://www.rfc-editor.org/info/rfc7587>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [RFC6716]  Valin, JM., Vos, K., and T. Terriberry, "Definition of the
              Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716,
              September 2012, <https://www.rfc-editor.org/info/rfc6716>.

   [opus-extension]
              Valin, J.-M., "Extension Formatting for the Opus Codec
              (draft-valin-opus-extension)", April 2023.

5.2.  Informative References

   [RFC6562]  Perkins, C. and JM. Valin, "Guidelines for the Use of
              Variable Bit Rate Audio with Secure RTP", RFC 6562,
              DOI 10.17487/RFC6562, March 2012,
              <https://www.rfc-editor.org/info/rfc6562>.

   [dred-paper]
              Valin, J.-M., Buethe, J., and A. Mustafa, "Low-Bitrate
              Redundancy Coding of Speech Using a Rate-Distortion-
              Optimized Variational Autoencoder", 2023,
              <https://arxiv.org/abs/2212.04453>.

Authors' Addresses

   Jean-Marc Valin
   Amazon
   Canada
   Email: jmvalin@amazon.com

   Jan Buethe
   Amazon
   Germany
   Email: jbuethe@amazon.com

Valin & Buethe           Expires 8 January 2024                 [Page 5]