Dynamic Foveated Rendering for VR and AR

In a previous article on Eye tracking, I touched on the subject of Dynamic Foveated Rendering.

I say dynamic, in contrast to Fixed Foveated Rendering because both the Oculus Go and now Varjo's VR1 headset offer this in a sense. Both of them provide 'full resolution' only at the center of the screen, and surround it by a lower resolution 'containing area'.

Why do this?

To avoid drawing as many pixels. The more pixels you draw (coupled with overdraw, where the same pixel is written to multiple times) the slower it will be to render your frame buffer. So if we can draw larger, lower resolution pixels on the non central areas of the display, then we can finish rendering the frame buffer more quickly.

Isn't that cheating?

Well... If you did the same thing on a desktop/console/mobile display people would complain that the edges are blurry. Oculus do it in software on the Go headset, whereas Varjo do it in hardware, by having a normal resolution "Context screen" that is 1,440 x 1,600 pixels per eye, and than overlaying another micro display that is 3,000 pixels per inch or 60 pixels per degree!

They call this the "focus screen", and it is described as astounding by uploadvr.com and others. 3,000 pixels sounds like a lot right? There is a catch. Varjo's focus screen's resolution is 1,920 x 1,080 pixels. They claim an "unprecedented 60 pixels per degreed", so this works out at 32 degrees horizontal resolution by 18 degrees vertical resolution. Obviously they overlay it at the centre of the context screen, giving the centre of the viewing area the highest resolution.

What happens if I look away from the centre?

Then your back in normal resolution land, things are going to be just as blurry as on current headsets. This is annoying because Varjo claim that they have made:

"the world's only human -eye resolution headset" - varjo.com

This is statement probably true - when looking straight ahead! I do not want to be unfair to them, but eye movement in VR is more frequent than head movement, and every time you look away from their focus screen, you are losing some/all of it's benefit.

Can it be done better?

Lets explore this:

The eye tracking shown here is from Michael Abrash's 2018 keynote, courtesy of Oculus.

He notes:

"while it is still a risk, but I am more comfortable in saying that highly reliable eye tracking will be here within four years. After all, it obviously works in Half Dome today, and while that is not the same as working across the entire population in a shipping product, getting the rest of the way there should be doable" - Michael Abrash, 2018 Oculus Keynote

Even if the eye tracking system is only accurate to 5 degrees, this should be enough to enable dynamic foveated rendering. In Nvidia's example above, only the area within the red corners would need to be rendered at full resolution, while the area outside the red corners can be rendered at lower resolutions, based on distance.

Can a micro display like Varjo's help?

Probably not, unless you can reflect/project it into different positions, appropriate to where the user is looking, on top of the context display. What we need are the highest resolution screens we can get, and then to intelligently drive them based on where the user is looking.

Which resolution is feasible within the next 2 years?

The VirtualLink specification offers a clue. It is backed by Nvidia, AMD, Valve, Microsoft Oculus and HTC, and is

"includes support for four lanes of HBR3 DisplayPort for high-resolution displays, USB 3.1 Gen2 (SuperSpeed USB 10Gbps) for headset cameras and sensors"

What is HBR3 DisplayPort? It is used within for version 1.4 of the DisplayPort standard, and allows for an overall transmission bandwidth of 32.4 Gbit/s (8.1Gbit/s per lane), for a total data throughput of 25.92 Gbit/s after factoring in 8b/10b enconding overhead. This is enough for a 4k UHD display (3,840 x 2,160) at 120 Hz with 24 bit/px RGB color or more usefully for VR, two 4K UHD (3,840 x 2,160) displays at 60Hz.

How do we calculate the bandwidth requirement?

3,840 x 2,160 x 24 bits x 2 displays x 60 Hz = 23,887,872,000 bits or 23.9 Gigabits per second, well within the 25.92 Gigabits per second that the standard provides for.

What sort of aspect ratio should we go for?

In VR 16:9 resolutions are less common, both the Oculus Quest and the Viv Pro operate at 1,440 by 1,600, a much more square resolution of close to 14.4:16.






So lets aim for a similar (14.2:16) aspect ratio as the Vive Pro and the Quest, but with more resolution, color depth and a 90Hz frame rate:

Potential display Type A-2 yields us 1,920 x 2,160 x 32 bits x 2 displays x 90 Hz = 23,887,872,000 bits or 23.9 Gigabits. Identical to the number above, and well within the 25.92 threshold that the VirtualLink standard provides for.

Meanwhile potential display Type A-1 forgoes the substantial benefits of HDR color and runs at a faster refresh rate: 1,920 x 2,160 x 24 bits x 2 displays x 120 Hz = 23,887,872,000 bits or 23.9 Gigabits. From a hardware point of view it may be sensible to support both HDR @ 90Hz and non HDR at 120 Hz modes, and allow software to choose which one to use, on a per application basis.

1,920 x 2,160 x 2 displays is the same as rendering to a 4K UHD buffer, 3840 x 2,160 but thanks to eye tracked foveated rendering we will need two sets of Quarter Resolution aggregate-buffers (480 x 540), Half Resolution aggregate-buffers (240 x 270), and a Full Resolution aggregate-buffer (240 x 270). Assuming GPU supports multiple render targets we can draw to all three aggregate buffers (per eye) within the same draw call.

If we are really lucky and had Simultaneous Multi Projection support on the GPU than we could actually draw to all six buffers at the same time.

Based on the numbers above we will be drawing to 480 x 540 x 2 + 240 x 270 x 2 + 240 x 270 x 2 buffers for a total area of 518,400 + 129,600 + 129,600 = 777,600. Of course we will probably still need to upscale to our original buffer, although hardware to read from multiple source frame buffers and composite the result 'on the fly' does exist already (Xbox One used this to separate GUI graphics from gameplay resolution). Our full buffer size is 3,840 x 2,160 which is 8,294,400 pixels, whereas the area that we are drawing to is (777,600 / 8,294,400) x 100 = 9.375%

Overall this is a huge performance win - we can render to a buffer that is providing 1,920 by 2,160 per eye at a cost of 10% we would expect under a non foveated render pipeline. And the resolution will follow our eyes current rotation, for example when user is looking left.



Another observation to make is that the sizes of the aggregate-buffers that I've used in this example above could be tuned in real time based on the current frame rate of the app. You could allocate larger buffers, and then choose how much of them to use based on the current frame-rate, dynamically.

For brevity my calculations deliberately ignored the stencil/Z buffer. We can give each aggregate buffer an appropriate (filled circle / inverse filled circle) mask to avoid most pixels from being drawn to more than one buffer in the same pass, in the positions where the buffers overlap each other.

As a strategic choice we can add MSAA, FSAA or another anti-aliasing technique to the full resolution aggregated buffer pair, to improve the image quality where it matters most, at a minimum cost.

Here is a good article on the way in which only the fovea centralis, located in the center of the retina has a higher density of photosensitive cells in comparison to the rest of our eyes, making foveated rendering possible.

Finally, what Field of View could this resolution be mapped to?

  • At 20 pixels per degree, 1,920 x 2,160 example above would give us 96 degrees horizontal FOV, and 108 degrees vertical FOV.
  • At 15 pixels per degree, 1,920 x 2,160 example above would give us 128 degrees horizontal FOV, and 144 degrees vertical FOV.

For Comparison, Varjo's VR1 headset offers an 87 degree FOV for the "Context Screen", and 32 degrees horizontal FOV by 18 degrees vertical FOV for the "Focus Screen".

This means you are only getting the benefit of the Focus Screen when your looking straight ahead.

While there will doubtless be use cases for this, in reality we move our eyes quite a lot. And reality, be it virtual or augmented, is what we are pursuing, ultimately.

要查看或添加评论,请登录

Daniel Dobson的更多文章

  • Remote Telepresence

    Remote Telepresence

    VR headsets continue to set resolution benchmarks with both the HP Reverb and Acer OJO supporting 2,160 x 2,160 pixels…

  • Timelines - encouraging achievement

    Timelines - encouraging achievement

    A timeline is a series of events in chronological order. As a developer, and later as a manager it struck me that at…

  • The Optimal VR headset for VirtualLink

    The Optimal VR headset for VirtualLink

    In a previous article on Foveated Rendering, I created a table of resolutions that are possible within the bandwidth…

  • Hiring - Job ads, Interviews & Processes

    Hiring - Job ads, Interviews & Processes

    Job advertisements When you write a job advertisement, you want it to reflect the details of the role, and to be…

    9 条评论
  • Resolution - How much is enough?

    Resolution - How much is enough?

    It depends. Long ago I used a Commodore Amiga at 320 x 240 and later 640 x 240.

  • Why HDR matters more in VR & AR

    Why HDR matters more in VR & AR

    High Dynamic Range (HDR) is a term that is used in relation to describing the increased area of variation in both color…

  • Eye tracking - applications in VR and AR

    Eye tracking - applications in VR and AR

    Both Magic Leap's AR headset and Oculus' Varifocal half-dome VR headset feature eye tracking. Here are some of the…

    4 条评论

社区洞察

其他会员也浏览了