Internet-Draft VoxelVideo Format September 2024
Habib & Joaquin Expires 10 March 2025 [Page]
Workgroup:
WG Working Group
Internet-Draft:
draft-habib-voxelvideo-00
Published:
Intended Status:
Standards Track
Expires:
Authors:
D. Habib
True3D Technologies Inc.
A. Joaquin
True3D Technologies Inc.

VoxelVideo Format

Abstract

This document proposes the VoxelVideo format, a file structure designed specifically for the efficient handling, playback, and livestreaming of 3D voxel-based videos. The format is intended for applications in gaming, virtual reality, live sports, and interactive media, providing a robust framework for managing complex 3D data with spatial precision and color fidelity. This document describes the current JSON-based version and outlines future plans to adopt a more efficent, compressed format.

About This Document

This note is to be removed before publishing as an RFC.

The latest revision of this draft can be found at https://example.com/LATEST. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-habib-voxelvideo/.

Discussion of this document takes place on the WG Working Group mailing list (mailto:WG@example.com), which is archived at https://example.com/WG.

Source for this draft and an issue tracker can be found at https://github.com/voxelvideos/voxel-video-file-format.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 10 March 2025.

1. Introduction

The VoxelVideo format addresses the need for an efficient and scalable method to handle, render, and stream 3D voxel-based videos. Existing video formats are not optimized for the distinct characteristics of voxel data, such as spatial precision and color fidelity in three dimensions. The VoxelVideo format is tailored for use in applications like gaming, virtual reality, live sports, and interactive media, where real-time manipulation and playback of complex 3D content are essential. This document describes the JSON-based format and outlines future enhancements, including the adoption of a compressed binary format aimed at improving performance and reducing file sizes.

2. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

This document uses the following terms:

  • Voxel: a three-dimensional pixel representing a point in 3D space with associated attributes such as color [AES].

  • I-frame (Intra-coded frame): A self-contained video frame that fully represents the voxel grid.

  • P-frame (Predicted frame): A video frame which contains only information on the differences from previous frames.

3. VoxelVideo Format Overview

The VoxelVideo format utilizes a JSON-based structure to organize 3D voxel data, facilitating efficient playback and manipulation. It is designed to be straightforward to implement, with a focus on clarity and accessibility in its initial version. Future iterations will shift towards a compressed binary format to enhance scalability and performance.

4. File Structure and Key Components

The current structure of the VoxelVideo is as follows:

export default interface VoxelVideoData {
  Version: number;        // Specifies the version of the file format
  Framerate: number;      // Frames per second
  Framecount: number;     // Total number of frames
  Duration: number;       // Total duration of the video in seconds
  Dimensions: {           // Dimensions of the voxel matrix
    x: number;            // Width in voxels
    y: number;            // Height in voxels
    z: number;            // Depth in voxels
  };
  Title: string;          // Title of the video
  Blocks: (string | null)[][]; // 3D array representing the voxel grid
}

4.1. Version

The Version field ensures compatibility across different iterations of the format.

4.2. Playback Metadata

Metadata fields like Framerate, Framecount, and Duration provide essential information for correct playback of the video content.

4.3. Voxel Dimensions

Dimensions defines the size of the voxel space, specifying the width, height, and depth necessary for rendering.

4.4. Title

The Title field serves to identify the content of the video, for easier cataloging and retrieval.

4.5. Blocks Array

The Blocks array represents the 3D voxel grid as a nested array, where each inner array corresponds to a frame of the video. Within each frame, the arrays represent slices of the voxel space at varying depths, providing a structured representation of the 3D voxel data across the entire frame.

5. Frame Types and Future Enhancements

Currently, the VoxelVideo format utilizes I-frames, where each frame is a complete voxel grid independent of other frames. Future versions will include P-frames, which will encode changes between frames to reduce file size and improve streaming efficiency in 3D environments.

6. Interpretation of the Blocks Array

To interpret the Blocks array, read the voxel grid plane by plane, starting at the top-left corner of the first plane (height 0), proceeding row by row from left to right. After completing all rows of a plane, move up to the next plane and repeat the process until all planes are read.

7. Potential Use Cases

The VoxelVideo format is suitable for scenarios requiring real-time interaction and manipulation of 3D video content. Use cases include immersive virtual reality environments where live 3D voxel video streaming can deliver spatially precise and dynamic visual data. The format is also applicable in education simulations that require high-fidelity 3D visualizations for teaching concepts in fields such as medicine, engineering, and environmental science. Interactive gaming environments can utilize the VoxelVideo format to represent 3D voxel data, allowing for the creation of fully manipulable virtual worlds that support complex interactions within the game environment.

In additon, the VoxelVideo format may be used in live sports broadcasting to generate 3D replays and visualizations, enabling viewers to observe events from various perspectives and analyze the content interactively. This capability can provide additional insights beyond traditional 2D video formats.

The VoxelVideo format supports livestreaming using Dynamic Adaptive Streaming over HTTP (DASH), similar to its application for 2D videos. This approach allows for efficient and scalable delivery of 3D voxel video content across various devices and network conditions by segmenting videos into smaller parts and encoding each segment at different quality levels.

8. Future Directions

The current version of the VoxelVideo format (version 0.0), utilizes a JSON-based structure designed for simplicity and ease of use, facilitating initial development and experimentation. Future iterations will focus on enhancing performance and scalability by transitioning to a compressed binary format, aiming to reduce file sizes and improve data storage and retrieval efficiency. Additionally, future versions will explore the implementation of P-frames to encode changes between frames, further optimizing file size and playback performance. These enhancements are intended to expand the format's capabilities, providing a more robust solution for managing and delivering 3D voxel-based video content at scale.

9. Security Considerations

This document does not specifically address security considerations. It is important for implementers of the VoxelVideo format to consider the potential security implications associated with processing untrusted voxel data. Implementers should account for the worst-case scenarios in terms of computational complexity, memory usage, and required processing resources when handling voxel data, especially in contexts like livestreaming where inputs may be dynamic and unverified.

10. IANA Considerations

This document has no IANA actions.

11. References

11.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.

11.2. Informative References

[AES]
Shchurova, C. I., "A methodology to design a 3D graphic editor for micro-modeling of fiber-reinforced composite parts", Journal Advances in Engineering Software, Volume 90, December 2015, Pages 76-82, .

Authors' Addresses

Daniel Habib
True3D Technologies Inc.
15 Morris Ave
Apt. 307
Long Branch, NJ, 07740
United States of America
Alyssa Joaquin
True3D Technologies Inc.