RFC 9699 | XR Use Case | December 2024 |
Krishna & Rahman | Informational | [Page] |
This document explores the issues involved in the use of edge computing resources to operationalize a media use case that involves an Extended Reality (XR) application. In particular, this document discusses an XR application that can run on devices having different form factors (such as different physical sizes and shapes) and needs edge computing resources to mitigate the effect of problems such as the need to support interactive communication requiring low latency, limited battery power, and heat dissipation from those devices. This document also discusses the expected behavior of XR applications, which can be used to manage traffic, and the service requirements for XR applications to be able to run on the network. Network operators who are interested in providing edge computing resources to operationalize the requirements of such applications are the intended audience for this document.¶
This document is not an Internet Standards Track specification; it is published for informational purposes.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are candidates for any level of Internet Standard; see Section 2 of RFC 7841.¶
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at https://www.rfc-editor.org/info/rfc9699.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Extended Reality (XR) is a term that includes Augmented Reality (AR), Virtual Reality (VR), and Mixed Reality (MR) [XR]. AR combines the real and virtual, is interactive, and is aligned to the physical world of the user [AUGMENTED_2]. On the other hand, VR places the user inside a virtual environment generated by a computer [AUGMENTED]. MR merges the real and virtual along a continuum that connects a completely real environment at one end to a completely virtual environment at the other end. In this continuum, all combinations of the real and virtual are captured [AUGMENTED].¶
XR applications have several requirements for the network and the mobile devices running these applications. Some XR applications (such as AR applications) require real-time processing of video streams to recognize specific objects. This processing is then used to overlay information on the video being displayed to the user. In addition, other XR applications (such as AR and VR applications) also require generation of new video frames to be played to the user. Both the real-time processing of video streams and the generation of overlay information are computationally intensive tasks that generate heat [DEV_HEAT_1] [DEV_HEAT_2] and drain battery power [BATT_DRAIN] on the mobile device running the XR application. Consequently, in order to run applications with XR characteristics on mobile devices, computationally intensive tasks need to be offloaded to resources provided by edge computing.¶
Edge computing is an emerging paradigm where, for the purpose of this document, computing resources and storage are made available in close network proximity at the edge of the Internet to mobile devices and sensors [EDGE_1] [EDGE_2]. A computing resource or storage is in close network proximity to a mobile device or sensor if there is a short and high-capacity network path to it such that the latency and bandwidth requirements of applications running on those mobile devices or sensors can be met. These edge computing devices use cloud technologies that enable them to support offloaded XR applications. In particular, cloud implementation techniques [EDGE_3] such as the following can be deployed:¶
Such techniques enable XR applications that require low latency and high bandwidth to be delivered by proximate edge devices. This is because the disaggregated components can run on proximate edge devices rather than on a remote cloud several hops away and deliver low-latency, high-bandwidth service to offloaded applications [EDGE_2].¶
This document discusses the issues involved when edge computing resources are offered by network operators to operationalize the requirements of XR applications running on devices with various form factors. For the purpose of this document, a network operator is any organization or individual that manages or operates the computing resources or storage in close network proximity to a mobile device or sensor. Examples of form factors include the following: 1) head-mounted displays (HMDs), such as optical see-through HMDs and video see-through HMDs, 2) hand-held displays, and 3) smartphones with video cameras and location-sensing capabilities using systems such as a global navigation satellite system (GNSS). These devices have limited battery capacity and dissipate heat when running. Also, as the user of these devices moves around as they run the XR application, the wireless latency and bandwidth available to the devices fluctuates, and the communication link itself might fail. As a result, algorithms such as those based on Adaptive Bitrate (ABR) techniques that base their policy on heuristics or models of deployment perform sub-optimally in such dynamic environments [ABR_1]. In addition, network operators can expect that the parameters that characterize the expected behavior of XR applications are heavy-tailed. Heaviness of tails is defined as the difference from the normal distribution in the proportion of the values that fall a long way from the mean [HEAVY_TAIL_3]. Such workloads require appropriate resource management policies to be used on the edge. The service requirements of XR applications are also challenging when compared to current video applications. In particular, several Quality-of-Experience (QoE) factors such as motion sickness are unique to XR applications and must be considered when operationalizing a network. This document examines these issues with the use case presented in the following section.¶
This use case involves an XR application running on a mobile device. Consider a group of tourists who are taking a tour around the historical site of the Tower of London. As they move around the site and within the historical buildings, they can watch and listen to historical scenes in 3D that are generated by the XR application and then overlaid by their XR headsets onto their real-world view. The headset continuously updates their view as they move around.¶
The XR application first processes the scene that the walking tourist is watching in real time and identifies objects that will be targeted for overlay of high-resolution videos. It then generates high-resolution 3D images of historical scenes related to the perspective of the tourist in real time. These generated video images are then overlaid on the view of the real world as seen by the tourist.¶
This processing of scenes and generation of high-resolution images are discussed in greater detail below.¶
The task of processing a scene can be broken down into a pipeline of three consecutive subtasks: tracking, acquisition of a model of the real world, and registration [AUGMENTED].¶
The XR application must generate a high-quality video that has the properties described above and overlay the video on the XR device's display. This step is called "situated visualization". A situated visualization is a visualization in which the virtual objects that need to be seen by the XR user are overlaid correctly on the real world. This entails dealing with registration errors that may arise, ensuring that there is no visual interference [VIS_INTERFERE], and finally maintaining temporal coherence by adapting to the movement of user's eyes and head.¶
As discussed in Section 2, the components of XR applications perform tasks that are computationally intensive, such as real-time generation and processing of high-quality video content. This section discusses the challenges such applications can face as a consequence and offers some solutions.¶
As a result of performing computationally intensive tasks on XR devices such as XR glasses, excessive heat is generated by the chipsets that are involved in the computation [DEV_HEAT_1] [DEV_HEAT_2]. Additionally, the battery on such devices discharges quickly when running such applications [BATT_DRAIN].¶
A solution to problem of heat dissipation and battery drainage is to offload the processing and video generation tasks to the remote cloud. However, running such tasks on the cloud is not feasible as the end-to-end delays must be within the order of a few milliseconds. Additionally, such applications require high bandwidth and low jitter to provide a high QoE to the user. In order to achieve such hard timing constraints, computationally intensive tasks can be offloaded to edge devices.¶
Another requirement for our use case and similar applications, such as 360-degree streaming (streaming of video that represents a view in every direction in 3D space), is that the display on the XR device should synchronize the visual input with the way the user is moving their head. This synchronization is necessary to avoid motion sickness that results from a time lag between when the user moves their head and when the appropriate video scene is rendered. This time lag is often called "motion-to-photon delay". Studies have shown that this delay can be at most 20 ms and preferably between 7-15 ms in order to avoid motion sickness [PER_SENSE] [XR] [OCCL_3]. Out of these 20 ms, display techniques including the refresh rate of write displays and pixel switching take 12-13 ms [OCCL_3] [CLOUD]. This leaves 7-8 ms for the processing of motion sensor inputs, graphic rendering, and round-trip time (RTT) between the XR device and the edge. The use of predictive techniques to mask latencies has been considered as a mitigating strategy to reduce motion sickness [PREDICT]. In addition, edge devices that are proximate to the user might be used to offload these computationally intensive tasks. Towards this end, a 3GPP study suggests an Ultra-Reliable Low Latency of 0.1 to 1 ms for communication between an edge server and User Equipment (UE) [URLLC].¶
Note that the edge device providing the computation and storage is itself limited in such resources compared to the cloud. For example, a sudden surge in demand from a large group of tourists can overwhelm the device. This will result in a degraded user experience as their XR device experiences delays in receiving the video frames. In order to deal with this problem, the client XR applications will need to use ABR algorithms that choose bitrate policies tailored in a fine-grained manner to the resource demands and play back the videos with appropriate QoE metrics as the user moves around with the group of tourists.¶
However, the heavy-tailed nature of several operational parameters (e.g., buffer occupancy, throughput, client-server latency, and variable transmission times) makes prediction-based adaptation by ABR algorithms sub-optimal [ABR_2]. This is because with such distributions, the law of large numbers (how long it takes for the sample mean to stabilize) works too slowly [HEAVY_TAIL_2] and the mean of sample does not equal the mean of distribution [HEAVY_TAIL_2]; as a result, standard deviation and variance are unsuitable as metrics for such operational parameters [HEAVY_TAIL_1]. Other subtle issues with these distributions include the "expectation paradox" [HEAVY_TAIL_1] (the longer the wait for an event, the longer a further need to wait) and the mismatch between the size and count of events [HEAVY_TAIL_1]. These issues make designing an algorithm for adaptation error-prone and challenging. In addition, edge devices and communication links may fail, and logical communication relationships between various software components change frequently as the user moves around with their XR device [UBICOMP].¶
As discussed in Sections 1 and 3, the parameters that capture the characteristics of XR application behavior are heavy-tailed. Examples of such parameters include the distribution of arrival times between XR application invocations, the amount of data transferred, and the inter-arrival times of packets within a session. As a result, any traffic model based on such parameters is also heavy-tailed. Using these models to predict performance under alternative resource allocations by the network operator is challenging. For example, both uplink and downlink traffic to a user device has parameters such as volume of XR data, burst time, and idle time that are heavy-tailed.¶
Table 1 below shows various streaming video applications and their associated throughput requirements [METRICS_1]. Since our use case envisages a 6 degrees of freedom (6DoF) video or point cloud, the table indicates that it will require 200 to 1000 Mbps of bandwidth. Also, the table shows that XR applications, such as the one in our use case, transmit a larger amount of data per unit time as compared to regular video applications. As a result, issues arising from heavy-tailed parameters, such as long-range dependent traffic [METRICS_2] and self-similar traffic [METRICS_3], would be experienced at timescales of milliseconds and microseconds rather than hours or seconds. Additionally, burstiness at the timescale of tens of milliseconds due to the multi-fractal spectrum of traffic will be experienced [METRICS_4]. Long-range dependent traffic can have long bursts, and various traffic parameters from widely separated times can show correlation [HEAVY_TAIL_1]. Self-similar traffic contains bursts at a wide range of timescales [HEAVY_TAIL_1]. Multi-fractal spectrum bursts for traffic summarize the statistical distribution of local scaling exponents found in a traffic trace [HEAVY_TAIL_1]. The operational consequence of XR traffic having characteristics such as long-range dependency and self-similarity is that the edge servers to which multiple XR devices are connected wirelessly could face long bursts of traffic [METRICS_2] [METRICS_3]. In addition, multi-fractal spectrum burstiness at the scale of milliseconds could induce jitter contributing to motion sickness [METRICS_4]. This is because bursty traffic combined with variable queueing delays leads to large delay jitter [METRICS_4]. The operators of edge servers will need to run a "managed edge cloud service" [METRICS_5] to deal with the above problems. Functionalities that such a managed edge cloud service could operationally provide include dynamic placement of XR servers, mobility support, and energy management [METRICS_6]. Providing support for edge servers in techniques such as those described in [RFC8939], [RFC9023], and [RFC9450] could guarantee performance of XR applications. For example, these techniques could be used for the link between the XR device and the edge as well as within the managed edge cloud service. Another option for network operators could be to deploy equipment that supports differentiated services [RFC2475] or per-connection Quality-of-Service (QoS) guarantees using RSVP [RFC2210].¶
Thus, the provisioning of edge servers (in terms of the number of servers, the topology, the placement of servers, the assignment of link capacity, CPUs, and Graphics Processing Units (GPUs)) should be performed with the above factors in mind.¶
Application | Throughput Required |
---|---|
Real-world objects annotated with text and images for workflow assistance (e.g., repair)¶ |
1 Mbps¶ |
Video conferencing¶ |
2 Mbps¶ |
3D model and data visualization¶ |
2 to 20 Mbps¶ |
Two-way 3D telepresence¶ |
5 to 25 Mbps¶ |
Current-Gen 360-degree video (4K)¶ |
10 to 50 Mbps¶ |
Next-Gen 360-degree video (8K, 90+ frames per second, high dynamic range, stereoscopic)¶ |
50 to 200 Mbps¶ |
6DoF video or point cloud¶ |
200 to 1000 Mbps¶ |
The performance requirements for XR traffic have characteristics that need to be considered when operationalizing a network. These characteristics are discussed in this section.¶
The bandwidth requirements of XR applications are substantially higher than those of video-based applications.¶
The latency requirements of XR applications have been studied recently [XR_TRAFFIC]. The following characteristics were identified:¶
Additionally, XR applications interact with each other on a timescale of an RTT propagation, and this must be considered when operationalizing a network.¶
Table 2 [METRICS_6] shows a taxonomy of applications with their associated required response times and bandwidths. Response times can be defined as the time interval between the end of a request submission and the end of the corresponding response from a system. If the XR device offloads a task to an edge server, the response time of the server is the RTT from when a data packet is sent from the XR device until a response is received. Note that the required response time provides an upper bound for the sum of the time taken by computational tasks (such as processing of scenes and generation of images) and the RTT. This response time depends only on the QoS required by an application. The response time is therefore independent of the underlying technology of the network and the time taken by the computational tasks.¶
Our use case requires a response time of 20 ms at most and preferably between 7-15 ms, as discussed earlier. This requirement for response time is similar to the first two entries in Table 2. Additionally, the required bandwidth for our use case is 200 to 1000 Mbps (see Section 4.1). Since our use case envisages multiple users running the XR application on their devices and connecting to the edge server that is closest to them, these latency and bandwidth connections will grow linearly with the number of users. The operators should match the network provisioning to the maximum number of tourists that can be supported by a link to an edge server.¶
Application | Required Response Time | Expected Data Capacity | Possible Implementations/ Examples |
---|---|---|---|
Mobile XR-based remote assistance with uncompressed 4K (1920x1080 pixels) 120 fps HDR 10-bit real-time video stream¶ |
Less than 10 milliseconds¶ |
Greater than 7.5 Gbps¶ |
Assisting maintenance technicians, Industry 4.0 remote maintenance, remote assistance in robotics industry¶ |
Indoor and localized outdoor navigation¶ |
Less than 20 milliseconds¶ |
50 to 200 Mbps¶ |
Guidance in theme parks, shopping malls, archaeological sites, and museums¶ |
Cloud-based mobile XR applications¶ |
Less than 50 milliseconds¶ |
50 to 100 Mbps¶ |
Google Live View, XR-enhanced Google Translate¶ |
In order to operationalize a use case such as the one presented in this document, a network operator could dimension their network to provide a short and high-capacity network path from the edge computing resources or storage to the mobile devices running the XR application. This is required to ensure a response time of 20 ms at most and preferably between 7-15 ms. Additionally, a bandwidth of 200 to 1000 Mbps is required by such applications. To deal with the characteristics of XR traffic as discussed in this document, network operators could deploy a managed edge cloud service that operationally provides dynamic placement of XR servers, mobility support, and energy management. Although the use case is technically feasible, economic viability is an important factor that must be considered.¶
This document has no IANA actions.¶
The security issues for the presented use case are similar to those described in [DIST], [NIST1], [CWE], and [NIST2]. This document does not introduce any new security issues.¶
Many thanks to Spencer Dawkins, Rohit Abhishek, Jake Holland, Kiran Makhijani, Ali Begen, Cullen Jennings, Stephan Wenger, Eric Vyncke, Wesley Eddy, Paul Kyzivat, Jim Guichard, Roman Danyliw, Warren Kumari, and Zaheduzzaman Sarker for providing helpful feedback, suggestions, and comments.¶