Internet Congestion Control Research Group                   L. Han, Ed.
Internet-Draft                                       Huawei Technologies
Intended status: Informational                                  K. Smith
Expires: September 13, 2017                                     Vodafone
                                                          March 12, 2017


 Problem Statement: Transport Support for Augmented and Virtual Reality
                              Applications
               draft-han-iccrg-arvr-transport-problem-01

Abstract

   As emerging technology, Augmented Reality (AR) and Virtual Reality
   (VR) bring up a lot of challenges to technologies such as information
   display, image processing, fast computing and networking.  This
   document will analyze the requirements of AR and VR to networking,
   especially to transport protocol.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 13, 2017.

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of



Han & Smith            Expires September 13, 2017               [Page 1]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Scope . . . . . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
     2.1.  Definitions . . . . . . . . . . . . . . . . . . . . . . .   4
   3.  Problem Statement . . . . . . . . . . . . . . . . . . . . . .   6
   4.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
   5.  Security Considerations . . . . . . . . . . . . . . . . . . .  10
   6.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  10
   7.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  10
     7.1.  Normative References  . . . . . . . . . . . . . . . . . .  10
     7.2.  Informative References  . . . . . . . . . . . . . . . . .  10
   Appendix A.  Key Factors for Network-Based AR/VR  . . . . . . . .  12
     A.1.  Latency Requirements  . . . . . . . . . . . . . . . . . .  12
       A.1.1.  Motion to Photon (MTP) Latency  . . . . . . . . . . .  12
       A.1.2.  Latency Budget  . . . . . . . . . . . . . . . . . . .  13
     A.2.  Throughput Requirements . . . . . . . . . . . . . . . . .  15
       A.2.1.  Average Throughput  . . . . . . . . . . . . . . . . .  15
       A.2.2.  Peak Throughput . . . . . . . . . . . . . . . . . . .  19
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  20

1.  Introduction

   Virtual Reality (VR) and Augmented Reality (AR) technologies have
   enormous potential in many different fields, such as entertainment,
   remote diagnosis, or remote maintenance.  AR and VR applications aim
   to cause users to perceive that they are physically present in a non-
   physical or partly non-physical world.  However, slightly unrealistic
   artefacts not only distract from the sense of immersion, but they can
   also cause `VR sickness' [VR-Sickness] by confusing the brain
   whenever information about the virtual environment is good enough to
   be believable but not wholly consistent.

   This document is based on the assumption and prediction that the
   current localized AR/VR will inevitably evolve to cloud based AR/VR.
   Since cloud processing and state will be able to supplement local AR/
   VR devices, helping to reduce their size and power consumption, and
   to provide much more content resource and flexibility to the AR/VR
   applications.

   Sufficient realism requires both very low latency and a very high
   information rate.  In addition the information rate varies
   significantly and can include large bursts.  This problem statement
   aims to quantify these requirements, which are largely driven by the



Han & Smith            Expires September 13, 2017               [Page 2]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   video component of the transmission.  The ambition is to improve
   Internet technology so that AR/VR applications can create the
   impression of remote presence over longer distances.

   The goal is for the Internet to be able to routinely satisfy these
   demanding requirements in 5-10 years.  Then it will become feasible
   to launch many new applications, using AR/VR technology in various
   arrangements as a new platform over the Internet.  A 5-10-year
   horizon is considered appropriate, given it can take 1-2 years to
   socialize a grand challenge in the IRTF/IETF then 2-3 years for
   standards documents to be drafted and taken through the RFC process.
   The technology itself will also take a few years to develop and
   deploy.  That is likely to run partly in parallel to standardization,
   so the IETF will need to be ready to intervene wherever
   interoperability is necessary.

1.1.  Scope

   This document is aimed at the transport area research community.
   However, initially, advances at other layers are likely to make the
   greatest inroads into the problem, for example:

   o  Network architecture: the physical distance between the content
      cloud of AR/VR and users are short enough to limit the latency
      caused by the propagation delay in physical media

   o  Motion sensors: reduction in latency for range of interest (RoI)
      detection

   o  Sending app: better targeted degradation of quality below the
      threshold of human perception, e.g. outside the range of interest

   o  Sending app: better coding and compression algorithms

   o  Access network: multiplexing bursts further down the layers and
      therefore between more users, e.g. traffic-dependent scheduling
      between layer-2 flows not layer-3 flows

   o  Core network: The capacity of the core network is sufficient to
      support transport of AR/VR traffic cross different service
      providers.

   o  Receiving app: better decoding and prediction algorithms

   o  Head mounted displays (HMDs): reducing display latency

   The initial aim is to state the problem in terms of raw information
   rates and delays.  This initial draft can then form the basis of



Han & Smith            Expires September 13, 2017               [Page 3]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   discussions with experts in other fields, to quantify how much of the
   problem they are likely to be able to remove.  Then subsequent drafts
   can better quantify the size of the remaining transport problem.

   This document focuses on unicast-based AR/VR, which covers a wide
   range of applications, such as VR gaming, shopping, surgery, etc.
   Broadcast/multicast-based AR/VR is outside the scope of this
   document.  It is likely to need more supporting technology such as
   multicast, caching and edge computing.  Broadcast/multicast-based AR/
   VR is for live or multi-user events, such as sports broadcasts or
   online education.  The idea is to use panoramic streaming
   technologies such that users can dynamically select different view
   points and angles to become immersed in different real time video
   streams.

   Our intention is not to promote enhancement of the Internet specially
   for AR/VR applications.  Rather AR/VR is selected as a concrete
   example that encompasses a fairly wide set of applications.  It is
   expected that an Internet that can support AR/VR will be able to
   support other applications requiring both high throughput and low
   latency, such as interactive video.  It should be able to support
   applications with more demanding latency requirements, but perhaps
   only over shorter distances.  For instance, low latency is needed for
   vehicle to everything (V2X) communication, for example between
   vehicles on roads, or between vehicles and remote cloud computing.
   Tactile communication has very demanding latency needs, perhaps as
   low as 1 ms.

2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

2.1.  Definitions

   E2E
         End-to-end

   HMD
         Head-Mounted Display or Device

   AR
         Augmented Reality (AR) is a live direct or indirect view of a
         physical, real-world environment whose elements are augmented
         (or supplemented) by computer-generated sensory input such as
         sound, video, graphics or GPS data.  It is related to a more
         general concept called mediated reality, in which a view of



Han & Smith            Expires September 13, 2017               [Page 4]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


         reality is modified (possibly even diminished rather than
         augmented) by a computer

   VR
         Virtual Reality (VR) is a computer technology that uses
         software-generated realistic images, sounds and other
         sensations to replicate a real environment or an imaginary
         setting, and simulates a user's physical presence in this
         environment to enable the user to interact with this space

   FOV
         Field of View is the extent of the world that is visible
         without eye movement, measured in degrees of visual angle in
         the vertical and horizontal planes

   Panorama
         Panorama is any wide-angle view or representation of a physical
         space, whether in painting, drawing, photography, film, seismic
         images or a three-dimensional model

   360 degree video
         360-degree videos, also known as immersive videos or spherical
         videos, are video recordings where a view in every direction is
         recorded at the same time, shot using an omnidirectional camera
         or a collection of cameras.  Most 360-degree video is
         monoscopic (2D), meaning that it is viewed as a one (360x180
         equirectangular) image directed to both eyes.  Stereoscopic
         video (3D) is viewed as two distinct (360x180 equirectangular)
         images directed individually to each eye. 360-degree videos are
         typically viewed via personal computers, mobile devices such as
         smartphones, or dedicated HMD

   MTP and MTP Latency
         Motion-To-Photon.  Motion-to-Photon latency is the time needed
         for a user movement to be fully reflected on a display screen
         [MTP-Latency].

   Unmanaged
         For the purpose of this document, if an unmanaged Internet
         service supports AR/VR applications, it means that basic
         connectivity provides sufficient support without requiring the
         application or user to separately request any additional
         service, even as a once-off request.








Han & Smith            Expires September 13, 2017               [Page 5]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


3.  Problem Statement

   Network based AR/VR applications need both low latency and high
   throughput.  We shall see that the ratio of peak to mean bit-rate
   makes it challenging to hit both targets.  To satisfy extreme delay
   and throughput requirements as a niche service for a few special
   users would probably be possible but challenging.  This document
   envisages an even more challenging scenario; to support AR/VR usage
   as a routine service for the mass-market in the future.  This would
   either need the regular unmanaged Internet service to support both
   low latency and high throughput, or it would need managed Internet
   services to be so simple to activate that they would be universally
   accessible.

   Each of the elements of the above requirements are expanded and
   quantified briefly below.  The figures used are justified in depth in
   Appendix A.

   MTP Latency:  AR/VR developers generally agree that MTP latency
      becomes imperceptible below about 20 ms [Carmack13].  However,
      some research has concluded that MTP latency MUST be less than
      17ms for sensitive users [MTP-Latency-NASA].  Experience has shown
      that standards bodies tend to set demanding quality levels, while
      motivated humans often happily adapt to lower quality although
      they struggle with more demanding tasks.  Therefore, we MUST be
      clear that this 20 ms requirement is designed to enable immersive
      interaction for the same wide range of tasks that people are used
      to undertaking locally.

   Latency Budget:  If the only component of delay was the speed of
      light, 20 ms round trip would limit the physical distance between
      the communicating parties to 3,000 km in air or 2,000 km in glass.
      We cannot expand the physical scope of an AR/VR application beyond
      this speed-of-light limit.  However, we can ensure that
      application processing and transport-related delays do not
      significantly reduce this limited scope.  As a rule of thumb they
      should consume no more than 5-10% (1-2 ms) of this 20 ms budget,
      and preferably less.  See Appendix A.1 for the derivation of these
      latency requirements.












Han & Smith            Expires September 13, 2017               [Page 6]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   +--------------+-------------+----------+-------------+-------------+
   |              | Entry-level | Advanced | Ultimate 2D | Ultimate 3D |
   +--------------+-------------+----------+-------------+-------------+
   | Video Type   | 4K 2D       | 12K 2D   | 24K 2D      | 24K 3D      |
   |              |             |          |             |             |
   | Mean bit     | 22 Mb/s     | 400 Mb/s | 2.9 Gb/s    | 3.3 Gb/s    |
   | rate         |             |          |             |             |
   | Peak bit     | 130 Mb/s    | 1.9 Gb/s | 29 Gb/s     | 38 Gb/s     |
   | rate         |             |          |             |             |
   | Burst time   | 33 ms       | 17 ms    | 8 ms        | 8 ms        |
   +--------------+-------------+----------+-------------+-------------+

   Table 1: Raw information rate requirements for various levels of AR/
                            VR (YUV 420, H.265)

   Raw information rate:  Table 1 shows the summary of mean and peak raw
      information rate for four types of H.265 video.  Not only does the
      raw information rate rise to very demanding levels, even for 12K
      'Advanced AR/VR'.  But the ratio of peak to mean increases from
      about 6 for 'Entry-Level' AR/VR to nearly 12 for 'Ultimate 3-D'
      AR/VR.  See Appendix A.2 for more details and derivation of these
      rate requirements.

   Buffer constraint:  It will be extremely inefficient (and therefore
      costly) to provide sufficient capacity for the bursts.  If the
      latency constraint were not so tight, it would be more efficient
      to provide less capacity than the peak rate and buffer the bursts
      (in the network and/or the hosts).  However even if capacity were
      only provided for 1/k of the peak bit rate, play-out would be
      delayed by (k-1) times the burst time.  For instance, if a 1G b/s
      link were provided for 'Advanced' AR/VR, we can see that k = 1.9.
      Then play-out would be delayed by (1.9 - 1) * 17 ms = 15 ms.  This
      would consume 75% of our 20 ms delay budget.  Therefore, it seems
      that capacity sufficient for the peak rate will be needed, with no
      buffering.  We then have to rely on application-layer innovation
      to reduce the peak bit rate.

   Simultaneous bursts:  One way to deal with such a high peak-to-mean
      ratio would be to multiplex multiple AR/VR sessions within the
      same capacity.  This problem statement assumes that the bursts are
      not correlated at the application layer.  Then the probability
      that most sessions burst simultaneously would become tiny.  This
      would be useful for the high degree of statistical multiplexing in
      a core network, but it would be less useful in access networks,
      which is where the bottleneck usually is, and where the number of
      AR/VR sessions in the same bottleneck might often be close to 1.
      Of course, if the bursts are correlated between different users,
      there will be no multiplexing gain.



Han & Smith            Expires September 13, 2017               [Page 7]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   Problems with Unmanaged TCP Service:  An unmanaged TCP solution would
      probably use some derivative of TCP congestion control [RFC5681]
      to adapt to the available capacity.  The following problems with
      TCP congestion control would have to be solved:

      Transmission loss and throughput:  TCP algorithms collectively
         induce a low level of loss, and the lower the loss the faster
         they go.  TCP throughput is used to measure such performance.
         No matter what TCP algorithm is used, the TCP throughput is
         always capped by some parameters, such as RTT, packet loss
         ration, etc.  Importantly, the TCP throughput is always lower
         than the physical link capacity.  So, for a single flow to
         attain the bit-rates shown in Table 1 requires a loss
         probability that is so low that it could be physically limited
         by the bit-error probability experienced over optical fiber
         links.  The analysis [I-D.ietf-tcpm-cubic] has collected the
         data for different TCP throughput and corresponding packet loss
         ration.

      Flow-rate equality:

         Host-Controlled:  TCP ensures rough equality between L4 flow
            rates as a simple way to ensure that no individual flow is
            starved when others are not [RFC5290].  Consider a scenario
            where one user has a dedicated 2 Gb/s access line, and they
            are running an AR/VR applications that needs a minimum of
            400 Mb/s.  If the AR/VR app used TCP, it would fail whenever
            the user (or their family) happened to start more than 4
            other TCP long flows at once, i.e, FTP flows.  This simple
            example shows that flow-rate equality will probably need to
            be relaxed to enable support for AR/VR as part of the
            regular unmanaged Internet service.  Fortunately, when there
            is enough capacity for one flow to get 400 Mb/s, every flow
            does not have to get 400 Mb/s to ensure that no-one starves.
            This line of logic could allow flow-rate equality to be
            relaxed in transport protocols like TCP.

         Network-Enforced:  However, if parts of the network were
            enforcing flow rate equality, relaxing it would be much more
            difficult.  For instance, deployment of the per-flow queuing
            scheduler in fq_CoDel [I-D.ietf-aqm-fq-codel] will introduce
            this problem.

      Dynamics:  The bursts shown in Table 1 would be problematic for
         TCP.  It is hard for the throughput of one TCP flow to jump an
         order of magnitude for one or two round trips, and even harder
         for other TCP flows to yield over the same time-scale without
         considerable queuing delay and/or loss.



Han & Smith            Expires September 13, 2017               [Page 8]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   Problems with Unmanaged UDP Service:  Using UDP as transport cannot
      solve the problems as faced by TCP.  Fundamentally, IP network can
      only provide the best-effort service, no matter if the transport
      on top of IP is TCP or UDP.  This is determined by the fact that
      most of network devices use different variations of "Fair Queuing"
      algorithm to queue IP flows without the awareness of TCP or UDP
      protocol.  As long as a fair queuing algorithm is used, a UDP flow
      cannot obtain more bandwidth or shorter latency than others.  But
      using UDP may reduce the burden of re-transmission of lost packet,
      if the lost packet is not so critical, like a non I-frame; or the
      lost packet has passed its life cycle.  Depending on if it has its
      own congestion control, current UDP service has two types:

      UDP with congestion control:  QUIC is a typical UDP service with
         congestion control.  The congestion control algorithm used in
         QUIC is similar to TCP CUBIC.  This makes QUIC behave also
         similar to TCP CUBIC.  There will be no fundamental difference
         compared with unmanaged TCP service in terms of fairness,
         convergence and bandwidth utilization, etc.

      UDP without congestion control:  If UDP is used as transport
         without extra congestion control, it will be weaker than with
         congestion control to support the AR/VR application with high
         throughput and short latency requirements.

   Problems with Managed Service:  As well as the common problems
      outlined above, such as simultaneous bursts, the management and
      policy aspects of managed QoS solution are problematic:

      Complex provisioning:  Currently QoS services are not
         straightforward to enable, which would make routine widespread
         support of AR/VR unlikely.  It has proved particularly hard to
         standardize how managed QoS services are enabled across host-
         network and inter-domain interfaces.

      Universality:  For AR/VR support to become widespread and routine,
         control of QoS provision would need to comply with the relevant
         Net Neutrality [NET_Neutrality_ISOC] legislation appropriate to
         the jurisdictions covering each part of the network path.

4.  IANA Considerations

   There is no change with regards to IANA








Han & Smith            Expires September 13, 2017               [Page 9]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


5.  Security Considerations

   There is no security issue introduced by this document

6.  Acknowledgements

   Special thanks to Bob Briscoe, he has given a lot advice and comments
   during the period of study and writing of this draft, he also has
   done a lot revision for the final draft.

   We would like to thank Kjetil Raaen and Steve Appleby for comments on
   early drafts of this work.

   We also like to thank Huawei's research team leaded by Lei Han, Feng
   Li and Yue Yin to provide the prospective analysis; also thank
   Guoping Li, Boyan Tu, Xuefei Tang and Tao Ma from Huawei for their
   involvement in the work discussion

   Lastly, we want to thank Huawei's Information LAB, some basic AR/VR
   data was from its research results

7.  References

7.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <http://www.rfc-editor.org/info/rfc2119>.

7.2.  Informative References

   [Carmack13]
              Carmack, J., "Latency Mitigation Strategies", February
              2013, <https://www.twentymilliseconds.com/post/latency-
              mitigation-strategies/>.

   [Chroma]   Wikipedia, "Chroma subsampling", 2016,
              <https://en.wikipedia.org/wiki/Chroma_subsampling>.

   [Fiber-Light-Speed]
              Kevin Miller, "Calculating Optical Fiber Latency", 2012,
              <http://www.m2optics.com/blog/bid/70587/
              Calculating-Optical-Fiber-Latency>.

   [GOP]      Wikipedia, "Group of pictures", 2016,
              <https://en.wikipedia.org/wiki/Group_of_pictures>.




Han & Smith            Expires September 13, 2017              [Page 10]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   [H264_Primer]
              Adobe, "H.264 Primer", 2016, <http://wwwimages.adobe.com/c
              ontent/dam/Adobe/en/devnet/video/articles/h264_primer/
              h264_primer.pdf>.

   [I-D.ietf-aqm-fq-codel]
              Hoeiland-Joergensen, T., McKenney, P.,
              dave.taht@gmail.com, d., Gettys, J., and E. Dumazet, "The
              FlowQueue-CoDel Packet Scheduler and Active Queue
              Management Algorithm", draft-ietf-aqm-fq-codel-06 (work in
              progress), March 2016.

   [I-D.ietf-tcpm-cubic]
              Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
              R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
              draft-ietf-tcpm-cubic-04 (work in progress), February
              2017.

   [MTP-Latency]
              Kostov, G., "Fostering Player Collaboration Within a
              Multimodal Co-Located Game", University of Applied
              Sciences Upper Austria, Masters Thesis , September 2015,
              <https://www.researchgate.net/publication/291516650_Foster
              ing_Player_Collaboration_Within_a_Multimodal_Co-
              Located_Game>.

   [MTP-Latency-NASA]
              Bernard D. Adelstein, et al, NASA Ames Research Center,
              etc, "HEAD TRACKING LATENCY IN VIRTUAL ENVIRONMENTS:
              PSYCHOPHYSICS AND A MODEL", 2003,
              <https://humansystems.arc.nasa.gov/publications/
              Adelstein_2003_Head_Tracking_Latency.pdf>.

   [NET_Neutrality_ISOC]
              Internet Society, "Network Neutrality, An Internet Society
              Public Policy Briefing", 2015,
              <http://www.internetsociety.org/sites/default/files/
              ISOC-PolicyBrief-NetworkNeutrality-20151030-nb.pdf>.

   [PSNR]     Wikipedia, "Peak signal-to-noise ratio", 2016,
              <https://en.wikipedia.org/wiki/Peak_signal-to-
              noise_ratio>.

   [Raaen16]  Raaen, K., "Response time in games : requirements and
              improvements", University of Oslo, PhD Thesis , February
              2016, <http://home.ifi.uio.no/paalh/students/
              KjetilRaaen-phd.pdf>.




Han & Smith            Expires September 13, 2017              [Page 11]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   [RFC5290]  Floyd, S. and M. Allman, "Comments on the Usefulness of
              Simple Best-Effort Traffic", RFC 5290,
              DOI 10.17487/RFC5290, July 2008,
              <http://www.rfc-editor.org/info/rfc5290>.

   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
              Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
              <http://www.rfc-editor.org/info/rfc5681>.

   [VR-Sickness]
              Wikipedia, "Virtual reality sickness", 2016,
              <https://en.wikipedia.org/wiki/
              Virtual_reality_sickness#cite_note-one-1>.

   [YUV]      Wikipedia, "YUV", 2016, <https://en.wikipedia.org/wiki/
              YUV>.

Appendix A.  Key Factors for Network-Based AR/VR

A.1.  Latency Requirements

A.1.1.  Motion to Photon (MTP) Latency

   Latency is the most important quality parameter of AR/VR
   applications.  With streaming video, caching technology located
   closer to the user can reduce speed-of-light delays.  In contrast
   with AR/VR user actions are interactive and rarely predictable.  At
   any time a user can turn the HMD to any angle or take any other
   action in response to virtual reality events.

   AR/VR developers generally agree that MTP latency becomes
   imperceptible below about 20 ms [Carmack13].  However, some research
   has concluded that MTP latency MUST be less than 17ms for sensitive
   users [MTP-Latency-NASA].  For a summary of numerous references
   concerning the limit of human perception of delay see the thesis of
   Raaen [Raaen16].

   Latency greater than 20 ms not only degrades the visual experience,
   but also tends to result in Virtual Reality Sickness [VR-Sickness].
   Also known as cybersickness, this can cause symptoms similar to
   motion sickness or simulator sickness, such as general discomfort,
   headache, nausea, vomiting, disorientation, etc.

   Sensory conflict theory believes that sickness can occur when a
   user's perception of self-motion is based on inconsistent sensory
   inputs between the visual system, vestibular (balance) system, and
   non-vestibular proprioceptors (muscle spindles), particularly when
   these inputs are at odds with the user's expectations from prior



Han & Smith            Expires September 13, 2017              [Page 12]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   experience.  Sickness can be minimized by keeping MTP latency below
   the threshold where humans can detect the lag between visual input
   and self-motion.

   The best localized AR/VR systems have significantly improved speed of
   sensor detection, display refresh, and GPU processing in their head-
   mounted displays (HMDs) to bring MTP latency below 20 ms for
   localized AR/VR.  However, network-based AR/VR research has just
   started.

A.1.2.  Latency Budget

   Figure 1 illustrates the main components of E2E delay in network-
   based AR/VR.

     +------+            +------+             +------+
     |  T1  |----------->|  T4  |------------>|  T2  |
     +------+            +------+             +------+
                                                  |
                                                  |
                                                  |
     +------+                                     |
     |  T6  |                                     |
     +------+                                     |
        ^                                         |
        |                                         |
        |                                         v
     +------+            +------+             +------+
     |  T5  |<-----------|  T4  |<------------|  T3  |
     +------+            +------+             +------+

      T1:  Sensor detection and Action capture
      T2:  Computation for ROI (Range of Interest) processing, rendering
           and encoding
      T3:  GOP (group of pictures) framing and streaming
      T4:  Network transport
      T5:  Terminal decoding
      T6:  Screen refresh

     Figure 1: The main components of E2E delay in network-based AR/VR

   Table 2 shows approximate current values and projected values for
   each component of E2E delay, based on likely technology advances in
   hardware and software.

   The current network transport latency is comprised of physical
   propagation delay and switching/forwarding delay at each network
   device.



Han & Smith            Expires September 13, 2017              [Page 13]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   1.  The physical propagation delay: This is the delay caused by the
   speed limit of signal transmitting in physical media.  Take the fiber
   as example, the optical transmit cannot exceed the light speed, or,
   300km/ms in free space.  But, light moving through the fiber optic
   core will travel slower than light through a vacuum because of the
   differences of the refractive index of light in free space and in the
   glass.  In normal optical fiber, the light speed is about 200km/ms
   [Fiber-Light-Speed].

   2.  The switching/forwarding delay: This delay normally is much more
   than the physical propagation delay, which can vary from 200us to
   200ms at each hop.

          +---------+--------------------+----------------------+
          | Latency | Current value (ms) | Projected value (ms) |
          +---------+--------------------+----------------------+
          |    T1   |         1          |          1           |
          |    T2   |         11         |          2           |
          |    T3   |    110 to 1000     |          5           |
          |    T4   |     0.2 to 100     |          ?           |
          |    T5   |         5          |          5           |
          |    T6   |         1          |         0.01         |
          |         |                    |                      |
          |   MTP   |    130 to 1118     |        13 + ?        |
          +---------+--------------------+----------------------+

                          MTP = T1+T2+T3+T4+T5+T6

   Table 2: Current and projected latency in key stages in network based
                                   AR/VR

   We can see that MTP latency is currently much greater than 20 ms.

   If we project that the technology development and advance would bring
   down the latency in some areas, such as reducing the latency caused
   by GOP framing and streaming dramatically down to 5ms by using
   improved parallel hardware processing, and reducing display response
   time (refreshing latency) to 0.1 us by using OLED, etc; then the
   budget for the round trip network transport latency will be about 5
   to 7 ms.

   This budget will be consumed by propagation delay, switching delay
   and queuing delay.  We can conclude

   1.  The physical distance between user and AR/VR server is limited
   and MUST be less than 1000km.  So, the deployment of AR/VR server
   SHOULD be close to user as much as possible.




Han & Smith            Expires September 13, 2017              [Page 14]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   2.  The total delay budget for network device will be low single
   digit, i.e. if the distance between user and AR/VR server is 600KM,
   then the accumulated maximum delay (round trip) allowed for all
   network devices is about 2 to 4ms.  This is equivalent to 1 to 2ms
   delay in one direction for all network devices on the path.

A.2.  Throughput Requirements

   The Network bandwidth required for AR/VR is the actual TCP throughput
   required by application if the AR/VR stream is transported by TCP.
   It is another critical parameter for the quality of AR/VR
   application.

   The AR/VR network bandwidth depends on the raw streaming data rate,
   or the bit rate for the video stream.

A.2.1.  Average Throughput

   The average network bandwidth for AR/VR is the average bit rate for
   AR/VR video.

   For AR/VR video stream, there are many parameters that can impact the
   bit rate, such as display resolution, 2D or 3D, normal view or
   panorama view, the codec type for the video processing, the color
   space and sampling algorithm, the video pattern, etc.

   Normally, the bit rate for 3D is approximately 1.5 times of 2D; and
   the bit rate for panorama view is about 4 times of normal view.

   The latest codec process for high resolution video is H.246 and
   H.265.  It has very high compression ratio.

   The color space and sampling used in modern video streaming are YUV
   system [YUV] and chroma subsampling [Chroma].

   YUV encodes a color image or video taking human perception into
   account, allowing reduced bandwidth for chrominance components,
   thereby typically enabling transmission errors or compression
   artifacts to be more efficiently masked by the human perception than
   using a "direct" RGB-representation.

   Chroma subsampling is the practice of encoding images by implementing
   less resolution for chroma information than for luma information,
   taking advantage of the human visual system's lower acuity for color
   differences than for luminance.

   There are different sampling systems depends on the ratio of
   different samples for colors, such as Y'CrCb 4:1:1, Y'CrCb 4:2:0,



Han & Smith            Expires September 13, 2017              [Page 15]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   Y'CrCb 4:2:2, Y'CrCb 4:4:4 and Y'CrCb 4:4:0.  The most widely used
   sampling methods is Y'CrCb 4:2:0, this is often called YUV420 (note,
   the similar sampling for analog encoding is called Y'UV).

   The video pattern, or motion rank, will also impact the stream bit
   rate.  The video frames change more frequent, the less data
   compression will be obtained.

   Compressed video stream consists of ordered successive group of
   pictures, or GOP [GOP].  There are three types of pictures (or
   frames) used in video compression, , such as H.264:

   Intra code picture, or I-frames [GOP], Predictive coded picture, or
   P-frames [GOP] and Bipredictive coded picture, or B-frames [GOP].

   An I-frame is in effect a fully specified picture, like a
   conventional static image file.  P-frames and B-frames hold only part
   of the image information, so they need less space to store than an
   I-frame and thus improve video compression rates.  A P-frame holds
   only the changes in the image from the previous frame.  P-frames are
   also known as delta-frames.  A B-frame saves even more space by using
   differences between the current frame and both the preceding and
   following frames to specify its content.

   A typical video stream have a sequence of GOP with pattern, for
   example, IBBPBBPBBPBB, or, IBBBBPBBBBPBBBB.

   The real bit rate also depends on the quality of the image user like
   to view.  The Peak signal-to-noise ratio, or PSNR [PSNR] is to denote
   the quality of a image.  The higher the PSNR, the better quality of
   the image, and the higher the bit rate.

   Since human can only distinguish some level of image quality
   difference, it would be efficient to network if we could provide
   image with minimum PSNR that human eye perception cannot distinguish
   with image having higher PSNR.  Unfortunately, this is still a
   research topic and there is no fixed minimum PSNR applies all people.

   So, there is no exact formula for the bit rate, however, we can have
   experimental formula for the rough estimation of the bit rate for
   different parameters.

   Formula (1) is from the H.264 Primer [H264_Primer]:








Han & Smith            Expires September 13, 2017              [Page 16]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


       Information rate = W * H * FPS * Rank * 0.07,    (1)

   where:
      W:    Number of pixels in horizontal direction
      H:    Number of pixels in vertical direction
      FPS:  Frames per second
      Rank: Motion rank, which can be:
            1: Low motion: video that has minimal movement
            2: Medium motion: video that has some degree of movement
            4: High motion: video that has a lot of movement and
               movement is unpredictable


   The four formulae tagged (2) below are more generic and with more
   parameters for calculation of approximate information rates:

       Average information rate = T * W * H * S * d * FPS / Cv )
       I-frame information rate = T * W * H * S * d * FPS / Cj )
       Burst size = T * W * H * S * d / Cj                     ) (2)
       Burst time = 1/FPS                                      )

   where:
      T:   Type of video, 1 for 2D, 2 for 3D
      W:   Number of pixels in horizontal direction
      H:   Number of pixels in vertical direction
      S:   Scale factor, which can be:
            1   for YUV400
            1.5 for YUV420
            2   for YUV422
            3   for YUV444
      d:   Color depth bits
      FPS: Frames per second
      Cv:  Average compression ratio for video
      Cj:  Compression ratio for I-frame


   Table 2 shows the bit rate calculated by the above formula 2 for
   different AR/VR levels.

   It MUST be noted that in the Table 2:

   1.  There is no industry standard about the type of VR yet.  The
   definition in the table is simply based on the 4K, 12K and 24K videos
   for 360x180 degree display.  The Ultimate VR is roughly corresponding
   to the so called "Retina Display" which is about 60 PPD (Pix per
   degree) or 300 PPI (Pix per inch).  However, there is argument about
   what is the limit of the human vision.  J.  Blackwell of the Optical
   Society of America has determined in 1946 that the resolution of the



Han & Smith            Expires September 13, 2017              [Page 17]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   human eye was actually closer to 0.35 arc minutes, which is more than
   3 times of the Apple's Retina Display (60 PPD).

   2.  The Mean and Peak Bit Rate illustrated in the table is calculated
   for a specific video with the acceptable perceptive PSNR, and with
   the typical compression ratio.  It does not represent all type of
   videos.  So, the compression ratio in the table is not universally
   applicable to all videos.

   3.  It MUST be aware that in the real use case, there are many
   schemes to reduce the video bit rate further in addition to the
   mandatory video compression.  For example, only transmit the expected
   resolution for the video in the FOV in time, but transmit the video
   in other areas in slower speed, lower quality and lower resolution.
   All these technologies and their impact to the bandwidth are out of
   the scope of the document.

   4.  We assume the whole 360 degree video is transmitted to user site.
   The same video could be viewed by naked eye, or by HMD (without too
   much processing power).  Thus, there is no difference to the network
   in bit rate, burst and burst time; The only difference is that using
   HMD can only view the video limited by its view angle.  But if the
   HMD has its own video decoder, powerful processing and can directly
   communicate with the AR/VR content source, the network only needs to
   transport the data defined by HMD resolution which is only a small
   percentage of the whole 360 degree video.  The corresponding data for
   mean/peak bit rate, burst size can be easily calculated by the
   formula (2).  The last row "Infor Ratio of HMD/Whole video" denotes
   the ratio of Information amount (mean/peak bit rate and burst size)
   between HMD and the whole 360 degree video.





















Han & Smith            Expires September 13, 2017              [Page 18]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   +-----------------+---------------+----------------+----------------+
   |                 | Entry-level VR|   Advanced VR  |  Ultimate VR   |
   +-----------------+---------------+----------------+----------------+
   |      Type       |   4K 2D Video |  12K 2D Video  |  24K 3D Video  |
   +-----------------+---------------+----------------+----------------+
   | Resolution W*H  |    3840*1920  |    11520*5760  |  23040*11520   |
   |360 degree video |               |                |                |
   +-----------------+---------------+----------------+----------------+
   | HMD Resolution/ |     960*960/  |    3840*3840/  |   7680*7680/   |
   |   view angle    |       90      |        120     |      120       |
   +-----------------+---------------+----------------+----------------+
   |        PPD      |       11      |       32       |       64       |
   | (Pix per degree)|               |                |                |
   +-----------------+---------------+----------------+----------------+
   |      d (bit)    |       8       |       10       |       12       |
   +-----------------+---------------+----------------+----------------+
   |       Cv        |       120     |       150      |200(2D), 350(3D)|
   +-----------------+---------------+----------------+----------------+
   |       FPS       |       30      |       60       |       120      |
   +-----------------+---------------+----------------+----------------+
   |  Mean Bit rate  |     22Mbps    |      398Mbps   |   2.87Gbps(2D) |
   |                 |               |                |   3.28Gbps(3D) |
   +-----------------+---------------+----------------+----------------+
   |       Cj        |       20      |        30      | 20(2D), 30(3D) |
   +-----------------+---------------+----------------+----------------+
   |  Peak bit rate  |     132Mbps   |      1.9Gbps   |    28.7Gbps(2D)|
   |                 |               |                |    38.2Gbps(3D)|
   +-----------------+---------------+----------------+----------------+
   |    Burst size   |    553K byte  |    4.15M Byte  |  29.9M Byte(2D)|
   |                 |               |                |  39.8M Byte(3D)|
   +-----------------+---------------+----------------+----------------+
   |    Burst time   |      33ms     |       17ms     |        8ms     |
   +-----------------+---------------+----------------+----------------+
   | Infor Ratio of  |     0.125     |      0.222     |      0.222     |
   | HMD/Whole Video |               |                |                |
   +-----------------+---------------+----------------+----------------+


         Table 2 Bit rate for different VR (use YUV420 and H.265)

A.2.2.  Peak Throughput

   The peak bandwidth for AR/VR is the peak bit rate for an AR/VR video.
   In this document, It is defined as the bit rate required to transport
   an I-frame, and the burst size is the size of I-frame, burst time is
   the time the I-frame must be transported from end to end based on
   FPS.




Han & Smith            Expires September 13, 2017              [Page 19]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   Similar to the Mean Bit rate, the calculation of Peak bit rate is
   purely theoretical and does not take any optimization into account.

   There are two scenarios that a new I-frame will be generated and
   transported.  One is when the AR/VR video display has dramatically
   changes that there is no similarity between two images; Another is
   when the FOV changes.

   When AR/VR user is moving header or moving his eyeball to change
   Range of Interest, the FOV will be changed.  FOV change may lead to
   the re-transmit of a new I-frame

   Since there is no reference frame for the video compression, the
   I-frame can only be compressed by the infra-frame processing, or the
   compression for a static image like JPEG, and the compression ratio
   is much smaller than the inter-frame compression ratio.

   It is estimated that the normal quality JPEG compression is about 20
   to 30, This is only a fraction of the compression ratio for the
   normal video streaming.

   In addition to the low compression issue, there is another problem
   involved.  Due to the limit of MTP, the new I-frame must be rendered,
   grouped, transmitted and displayed in the delay budge for the network
   transport.  This will cause the peak bit rate and burst size much
   bigger than the normal video streaming like IPTV.

   The peak bit rate or the bit rate for I-frame, burst size and burst
   time are shown in the Formula 2.  From the formula we can see the
   ratio of peak bit rate and the average bit rate is the ration of Cv/
   Cj.  Since the Cv could be 100 to 200 for 2D, but the Cj is only
   about 20 to 30, so, the peak bit rate is about 10 times of average
   bit rate.

Authors' Addresses

   Lin Han (editor)
   Huawei Technologies
   2330 Central Expressway
   Santa Clara, CA  95050
   USA

   Phone: +10 408 330 4613
   Email: lin.han@huawei.com







Han & Smith            Expires September 13, 2017              [Page 20]


Internet-Draft     Transport Support for AR/VR Problem        March 2017


   Kevin Smith
   Vodafone
   UK

   Email: Kevin.Smith@vodafone.com














































Han & Smith            Expires September 13, 2017              [Page 21]