The (Tech) art of performing mobile vr
Developing a 3d game for mobile is already a challenge performance wise. Imagine making one for vr (rendering the scene twice) in which you speed through the environment.
Last year I worked on the title Coaster Combat using the UE4 engine. It's a roller coaster shooter for the gear vr S7+ and oculus go. Making sure this game kept performing during development and at release was my main task.
With this article i want to share insights on how this was achieved by talking about problems and solutions I found while working on this project. Some words are hyperlinks to website with more information on that topic.
Game thread - performing logic
The ticking bomb
Avoid calculating things every frame/tick whenever possible! Think if you really need to do something every frame or if doing it every 2 seconds works just as well. If you are checking for things every frame, remove the check by working with event dispatchers instead. Blueprints logic that happens every tick can be optimized by being converted to code.
Watch where you are going!
Moving an object that has collision is heavy. Especially if it moves constantly, since it has to do a collision check every tick. Turn off collision where possible.
Prevent duplicate behavior:
If you want a 100 objects to move using a timeline ,like a floating soul orb in our case. Calculate this movement in one place where all the orbs can read from, that way you won't have a 100 orbs trying to evaluate a timeline individually.We also added an optimization where the collectible orb is only active while rendered.
Creation and destruction
Spawning and destroying actors can really cause a hitch. Try to spawn as many objects when the level is being loaded so there will not be any hitches during play.
An example of this optimisation: As a pistol shot hits the surrounding, an impact particle effect is spawned. As the particle effect finishes it gets deleted. To optimize this, spawn 2 impact particle effects at the start of the level and just alternately move them to the right positions and activate them. This prevents having to spawn and delete them constantly. We need 2 particle effects because the player can trigger one every 0.3 seconds and the particle effect lasts for 0.5 seconds.
Render thread
Ue4 shader complexity view is not 100% representative
Foremost you should set your preview mode to "mobile Preview" to get a more accurate shader complexity view.
The UE4 shader complexity view just counts shader instructions. If the shader has an "if" statement halfway which breaks out of the shader, the shader is cheaper than indicated in the complexity view. This is the case for masked materials. If a shader has a loop inside, it can be much more expensive than indicated. A texture sample node and a multiply node are both counted as 1 instruction even though the texture sample is a much heavier operation. Shaders with many texture lookups should be optimized.
Make your shader complexity bar more strict
By default the shader complexity bar goes up to 2000. When developing for mobile you want this number to be more strict like 200. Inside your DefaultEngine.ini in your project folder or your BaseEngine.ini in your engine folder, add "MaxPixelShaderAdditiveComplexityCount=200" under "[/Script/Engine.Engine]" .
Midgame shader compilation
Troubled by hitches during play? It could be caused by the shader compilation of an object you spawn dynamically. In our case we had enemies that were spawned dynamically. First we combated this by spawning one enemy while the level was being loaded and thus made sure this shader compilation took place during that time and not during play. In the end we utilized a shader cache. More on that here.
Texture groups and memory bandwidth
In ue4 textures are sorted into texture groups. Per group you can manipulate settings through the BaseDeviceProfiles.ini file. You can for example set all lightmap textures to a bilinear sampling filter type (which is cheaper but can cause some visual artifacts) or set all character textures to a max resolution of 2048.
One great use of these texture groups is to see if memory bandwidth is your bottleneck. Set the LODBias to 2 on all groups. This decreases the resolution of all textures to 1/16th. Since textures are big memory wise, if this fixes your performance, you are memory bandwidth bound. Basically the hardware has too much data it has to read at once. Using less textures or decreasing their resolution can fix this issue. Also check what memory is being referenced during play following the guide below, you may be accidentally referencing stuff you don't want.
Memory usage
To see what is being referenced in memory at a point in time, use the following console command: "memreport -full". This creates a file in "YourGame/Saved/Profiling/MemReports" with information on what is being referenced in memory. See this article for more info.This can also help you spot things that are unnecessary referenced in memory.
An example: In a memory report I took during the play of a level I spotted all 3 different player carts were loaded in memory while that should have been just 1. Connected to these player carts were unique weapons, particle effects, ui and textures. Following the reference viewer i found a trigger on the track that was blueprinted to get the correct level by trying to cast to all 3 player carts, effectively loading them all in memory. It was quickly fixed and much less memory was being referenced.
Drawcalls
A drawcall is basically a bundle of information(shaders,textures,buffers etc) the CPU collects and sends to the GPU in order to render something. Drawcalls are counted per material on an object. For example: We have object A with 3 different materials (wood, stone, metal) and object B with 2 materials (wood,stone)
In total these 2 objects cost 5 drawcalls Even though they share the same materials, the drawcalls still need to happen for each of them individually.
On mobile we had a strict budget between 30 and 60 drawcalls at any time. How the hell do we manage that when we are on a roller coaster speeding through environments?
To stay within this limit we applied several tricks. First we atlassed the textures of all props that shared the same base material (opaque) into one big texture. This meant 80 procent of all our assets were using the same material. Still separate objects count as separate drawcalls but if you utilize ue4s hlod system you can merge those objects into one. That way you can have a cluster of many objects be only 1 drawcall.
Secondly we set a tight distance cull range using "cull distance volumes" combined with preventing the player from being able to see far in the distance or cloud the player's view with a cheap fog. This way we are drawing as little as possible.
We need more cowbell! and less triangles
We put our triangle budget between 80k and 100k. To reach these numbers you have to be strict when modelling assets. We created 2 or 3 lods for every asset. In the HLOD world settings you can specify an HLOD level to use a specific LOD level of the meshes it combines. We used this to our advantage and build up an HLODed environment with different HLOD levels that can switch based on how close the player is.
Daan Niphuis made extensive improvements to the HLOD tool including one which removed lots of triangles. Since the player drives on a track the camera is always in the same places along that track. A system was build that measures which triangles are visible from start to finish of the track. Triangles that are not visible, are culled in the resulting HLOD.
Since the HLODtool first packs uv's and does the culling after, a lot of uv space is left unused. The second optimization was made by Daan to pack the uv's after triangle culling is finished.
Loading collision meshes took a long time. To combat this the triangle culling system was also applied to the collision meshes.
Precomputed visibility volumes
These volumes generate visibility cells above surfaces in your level which store what is visible from within that cell. It basically acts as a cached occlusion. This feature has a couple of limitations but can be very powerful, so definitely check if this is a feature for your game. Because we stream levels in our game, we could not make use of this feature.
Overdraw
A bad example of overdraw is when your screen first draws the skysphere (filling all pixels) and then draws the level (Redrawing a lot of pixels). Usually cases aren't this extreme, but it explains the problem of overdraw.
When I investigated render performance using the intel GPA frame analyzer I noticed the following render order. Opaque materials, masked materials, translucent materials, materials on skinned objects. This meant that a lot of overdraw was happening. This was fixed in code to make sure that the priority was set on how close an object is to the camera. If you have a skysphere or other distant object , you can turn on 'treat as background for occlusion' in the details panel.
Another obvious case of overdraw is the (over)use of transparency. Try to find tricks around using transparency. In several cases we used transparency in combination with an unlit shader.
Automated performance test setup and tools of the trade
To keep track of performance an automated test system was put in place. A version of each targeted hardware device was connected to a server. At night this server would run the latest build on these devices and send out an email with test results. We could also trigger tests ourselves with a normal or custom build. This was instrumental on getting insight in our overall performance on each device and spotting problems quickly.
In case these charts showed high drawcall counts, that was a hint the cull distance volumes were not working or the HLODs were not build. In case the game thread was high I would record a CPU profile and would view it in the UE4 session frontend profiler to spot spikes or overall high offenders. The GPU was really hard to put a finger on since it was impossible to get stats on it for mobile. If there were no high offenders in the charts we had to presume it was the GPU or the memory. To check if the GPUwas the bottleneck, we lowered the screen percentage (draw less pixels) and see if performance improved. To check if memory was the bottleneck I used the low texture resolution method described above. I regularly used the intel GPA frame analyzer to take a closer look in the rendering process. Even though we couldn't use the timings, since you can only use it on pc, it still gave valuable insights.
That's all folks
Let me know if you have any questions or smart insights you would like to share.
Technical Artist | Shaders | Game Assets | Designer | VFX | Composer
5 年And how to deal with? anti aliasing for mobile vr, and pixelated geometry?
GeForce DevTech Engineer at NVIDIA
6 年If you are interested, you can also try the following: - Reduce or even completely switch off the anisotropic filtering on distance objects - Use compressed textures if possible (from what i recall UE4 does that by default). - Ensure your textures are always power of two. Some devices have difficulty working with none power of two textures. - In addition to a reduced LOD, a distant object should also have a simpler shader (preferably the same shader so that multiple objects can be batch-rendered) - Distant lights can be merged into a single light. - Experiment with a lower precision format for your shadowmap rendertarget.