Instantly Boost Unity Game Performance With IL2CPP_USE_SPARSEHASH

Instantly Boost Unity Game Performance With IL2CPP_USE_SPARSEHASH

For a better experience check out the original post on my blog (especially better code highlighting): https://gamedev.center/instantly-boost-unity-game-performance-with-il2cpp_use_sparsehash/


Unity abstracts away many low-level details, allowing game developers to focus on creating amazing experiences. IL2CPP (Intermediate Language to C++ Compiler) is a prime example of such a technology that simplifies development in most cases. However, as your game scales and grows in complexity, you may need to dive deeper into the internals of abstracted engine components like IL2CPP. In this post, we will uncover one of these low-level details to boost the performance of your game in one easy step.

This post won’t delve too deep into IL2CPP internals to keep it concise. If you want to learn more about IL2CPP: the docs are a good starting point, as well as one of the greatest series in the Unity blog: An introduction to IL2CPP internals, Method calls, A tour of generated code, Debugging tips for generated code. Additionally, Jackson Dunstan’s blog provides valuable insights into IL2CPP internals.

However, one thing is not covered that much anywhere: the metadata generated for each type and used for things like virtual method invocation. What is more crucial, you won’t be able to find online how the metadata is stored in the memory and about the existence of the define IL2CPP_USE_SPARSEHASH. So let’s check the internals available to us in the generated C++ code to learn more about it and how to significantly boost the performance of some operations in our games using this knowledge.

Table Of Contents


IL2CPP Metadata

It is the structured data that describes all the elements within the compiled code assemblies, such as types, methods, fields, properties, events, and more. It’s data that helps the IL2CPP runtime understand the structure and behavior of the original C# code. Metadata has various purposes and I won’t even provide an exhaustive list here. The main usages of metadata are (if something is missing in this list, that you consider significant, add in the comments):

  • Reflection
  • Virtual method invocation
  • Debug information (source code mapping)
  • Instance creation and initialization

You can be sure that every project uses most of it. What is more absolutely all relatively big mobile games I have ever worked on use dependency injection frameworks, which in the vast majority of implementations heavily utilize the reflection. And reflection might have a toll on performance. So if you thought you didn’t use the reflection, some third-party libs in your game might do it and affect the performance.


How Metadata Is Stored In Memory

Metadata is generated for each type in a game. Moreover, the extensive use of generics (e.g., while using UniTask or dependency injection frameworks) can lead to an unprecedented number of types, making the data structure used for storing metadata a critical factor in game performance. Let’s dive into the generated C++ project to understand Unity’s approach to this challenge.

You might ask where do we start to find that? Here is my thought process: To generate a C++ project instead of a game executable, just check this option in the build settings:

Now we can open the .sln file in the IDE of your choice. I prefer Rider for this.

I used the full-text search to find the code generated for one of the classes in my Unity project. And in every generated method we have a call to il2cpp_codegen_initialize_runtime_metadata:

// C#
public class ReflectionInvoker : MonoBehaviour
{
    private void Awake()
    {
        Application.targetFrameRate = 30;
    }
    ...
}

//C++
IL2CPP_EXTERN_C IL2CPP_METHOD_ATTR void ReflectionInvoker_Awake_m6407C71AF48BCF2E9C5C76E3E212A964DB8DD494 (ReflectionInvoker_t7CC7A545463AF1F7B50035A6C116E12F340E6E7C* __this, const RuntimeMethod* method) 
{
    static bool s_Il2CppMethodInitialized;
    if (!s_Il2CppMethodInitialized)
    {   
        il2cpp_codegen_initialize_runtime_metadata((uintptr_t*)&Application_tDB03BE91CDF0ACA614A5E0B67CFB77C44EB19B21_il2cpp_TypeInfo_var);
        s_Il2CppMethodInitialized = true;
    }
    ...
}        

Then I looked for il2cpp_codegen_initialize_runtime_metadata definition and il2cpp::vm::MetadataCache caught my eye:

//il2cpp-codegen-il2cpp.cpp
void il2cpp_codegen_initialize_runtime_metadata(uintptr_t* metadataPointer)
{
    il2cpp::vm::MetadataCache::InitializeRuntimeMetadata(metadataPointer);
 
    // We don't need a memory barrier here, InitializeRuntimeMetadata already has one
    // What we need is a barrier before setting s_Il2CppMethodInitialized = true in the generated code
    // but adding that to every function increases code size, so instead we rely on this function
    // being called before s_Il2CppCodeRegistrationInitialized is set to true
    il2cpp::os::Atomic::FullMemoryBarrier();
}        

And in MetadataCache.h we see Il2CppHashMap:

typedef Il2CppHashMap<const char*, Il2CppClass*, il2cpp::utils::StringUtils::StringHasher<const char*>, il2cpp::utils::VmStringUtils::CaseSensitiveComparer> WindowsRuntimeTypeNameToClassMap;
typedef Il2CppHashMap<const Il2CppClass*, const char*, il2cpp::utils::PointerHash<Il2CppClass> > ClassToWindowsRuntimeTypeNameMap;
typedef Il2CppHashMap<il2cpp::metadata::Il2CppSignature, int32_t, il2cpp::metadata::Il2CppSignatureHash, il2cpp::metadata::Il2CppSignatureCompare> Il2CppUnresolvedSignatureMap;
typedef Il2CppHashMap<const Il2CppGenericMethod*, const Il2CppGenericMethodIndices*, il2cpp::metadata::Il2CppGenericMethodHash, il2cpp::metadata::Il2CppGenericMethodCompare> Il2CppMethodTableMap;        

That leads us to Il2CppOutputProject\IL2CPP\libil2cpp\utils\Il2CppHashMap.h:22 and that’s it. Here it is, the hash map that is used for storing the metadata:

//Il2CppOutputProject\IL2CPP\libil2cpp\utils\Il2CppHashMap.h
...
template<class Key, class T,
         class HashFcn = SPARSEHASH_HASH<Key>,
         class EqualKey = std::equal_to<Key>,
         class Alloc = GOOGLE_NAMESPACE::libc_allocator_with_realloc<std::pair<const KeyWrapper<Key>, T> > >
#if IL2CPP_USE_SPARSEHASH
class Il2CppHashMap : public GOOGLE_NAMESPACE::sparse_hash_map<KeyWrapper<Key>, T, HashFcn, typename KeyWrapper<Key>::template EqualsComparer<EqualKey>, Alloc>
#else
class Il2CppHashMap : public GOOGLE_NAMESPACE::dense_hash_map<KeyWrapper<Key>, T, HashFcn, typename KeyWrapper<Key>::template EqualsComparer<EqualKey>, Alloc>
#endif
...        

We can see that the cache uses sparse_hash_map or dense_hash_map based on the IL2CPP_USE_SPARSEHASH definition. Let’s see how one is selected and whether we can configure it.


How Il2CppHashSet Type Is Configured

I continued the journey through the generated C++ project. A quick text search in the project for IL2CPP_USE_SPARSEHASH led me to Il2CppOutputProject\IL2CPP\libil2cpp\il2cpp-config.h:576

#define IL2CPP_USE_SPARSEHASH (IL2CPP_TARGET_ANDROID || IL2CPP_TARGET_IOS)        

It means that a sparse hash set is used for mobile platforms, while the dense one for others like PC, consoles, etc. So il2cpp-config.h must come from somewhere and I had the experience using Unity templates in my previous post about crashes on iOS and how to fix them. Indeed we can find this file in "Unity installation directory"\Editor\Data\il2cpp\libil2cpp\il2cpp-config.h. What is more, if I modify it, then the changes are reflected in the output C++ project, which means we can configure it, it’s amazing.

#define IL2CPP_USE_SPARSEHASH 1        

Now I’m interested does it make sense to configure? I believe the reason why Unity opted for the sparse version for mobile is that it must utilize memory better. But what about the runtime performance?


Performance Tests

sparse_hash_map vs dense_hash_map

The GOOGLE_NAMESPACE::sparse_hash_set in the source code provides a clue on where to find the implementation of these hash maps. Fortunately, the code is open-source and has docs #1 and #2. And even performance tests:

SPARSE_HASH_MAP:
map_grow                  665 ns
map_predict/grow          303 ns
map_replace               177 ns
map_fetch                 117 ns
map_remove                192 ns
memory used in map_grow    84.3956 Mbytes

DENSE_HASH_MAP:
map_grow                   84 ns
map_predict/grow           22 ns
map_replace                18 ns
map_fetch                  13 ns
map_remove                 23 ns
memory used in map_grow   256.0000 Mbytes        

The choice of using a sparse hash map for metadata storage becomes evident when considering its significantly lower memory consumption, which can be critical for low-end devices with limited RAM. However, this memory optimization comes at the cost of more time spent on operations involving the hash map.


sparse_hash_map vs dense_hash_map in Unity

To evaluate the impact of metadata usage, let’s consider one of the most common cases: resolving dependencies in the composition root via a dependency injection (DI) framework. In this case, we’ll use Zenject, a framework that heavily relies on reflection and instance creation, potentially putting a significant load on the MetadataCache. For this test, a class generator was created to generate 25,000 classes, a binder for all of them, and 1250 classes that inject 20 dependencies each. The source code for this test is available on GitHub (don’t mind the code quality, as the generator itself was entirely generated by chat gpt). I decided to use ZEN_INTERNAL_PROFILING as the benchmark to test the hypothesis. All tests are performed on a target device, which is a PC with the following specs:

13th Gen Intel(R) Core(TM) i7-13700KF   3.40 GHz
GeForce RTX 4070 Ti        

Here are the results:

No reflection baking
26250 classes and bindings 

IL2CPP_USE_SPARSEHASH 1
| Total Time (ms) |
|-----------------|
| 5301.64         |
| 5130.90         |
| 5045.09         |
|-----------------|

IL2CPP_USE_SPARSEHASH 0
| Total Time (ms) |
|-----------------|
| 2251.10         |
| 2203.91         |
| 2205.35         |
|-----------------|        

Detailed Results:

No Reflection baking
26250 classes and bindings 

IL2CPP_USE_SPARSEHASH 1
WindowsPlayer "_" SceneContext.Awake detailed profiling: Total time tracked: 5301,64 ms.  Details:
  83,2% (52537x) (4410 ms) User Code
  11,6% (26274x) (0618 ms) Type Analysis - Direct Reflection
  01,8% (26259x) (0097 ms) DiContainer.Instantiate
  01,6% (51319x) (0085 ms) DiContainer.Resolve
  01,2% (26274x) (0063 ms) Type Analysis - Calling Baked Reflection Getter
  00,4% (00001x) (0019 ms) Other
  00,2% (26266x) (0009 ms) DiContainer.Inject
  00,0% (00001x) (0000 ms) GameObject.Instantiate
  00,0% (00008x) (0000 ms) Searching Hierarchy
  00,0% (00011x) (0000 ms) DiContainer.Bind

WindowsPlayer "_" SceneContext.Awake detailed profiling: Total time tracked: 5130,90 ms.  Details:
  84,9% (52537x) (4356 ms) User Code
  10,6% (26274x) (0542 ms) Type Analysis - Direct Reflection
  01,7% (26259x) (0085 ms) DiContainer.Instantiate
  01,4% (51319x) (0070 ms) DiContainer.Resolve
  01,0% (26274x) (0052 ms) Type Analysis - Calling Baked Reflection Getter
  00,4% (00001x) (0018 ms) Other
  00,1% (26266x) (0008 ms) DiContainer.Inject
  00,0% (00001x) (0000 ms) GameObject.Instantiate
  00,0% (00008x) (0000 ms) Searching Hierarchy
  00,0% (00011x) (0000 ms) DiContainer.Bind

WindowsPlayer "_" SceneContext.Awake detailed profiling: Total time tracked: 5045,09 ms.  Details:
  84,4% (52537x) (4258 ms) User Code
  11,0% (26274x) (0553 ms) Type Analysis - Direct Reflection
  01,7% (26259x) (0084 ms) DiContainer.Instantiate
  01,4% (51319x) (0071 ms) DiContainer.Resolve
  01,0% (26274x) (0053 ms) Type Analysis - Calling Baked Reflection Getter
  00,4% (00001x) (0018 ms) Other
  00,1% (26266x) (0008 ms) DiContainer.Inject
  00,0% (00001x) (0000 ms) GameObject.Instantiate
  00,0% (00011x) (0000 ms) DiContainer.Bind
  00,0% (00008x) (0000 ms) Searching Hierarchy


IL2CPP_USE_SPARSEHASH 0
WindowsPlayer "_" SceneContext.Awake detailed profiling: Total time tracked: 2251,10 ms.  Details:
  64,5% (52537x) (1452 ms) User Code
  24,1% (26274x) (0543 ms) Type Analysis - Direct Reflection
  04,2% (26259x) (0094 ms) DiContainer.Instantiate
  03,5% (51319x) (0078 ms) DiContainer.Resolve
  02,7% (26274x) (0060 ms) Type Analysis - Calling Baked Reflection Getter
  00,7% (00001x) (0015 ms) Other
  00,3% (26266x) (0008 ms) DiContainer.Inject
  00,0% (00001x) (0000 ms) GameObject.Instantiate
  00,0% (00008x) (0000 ms) Searching Hierarchy
  00,0% (00011x) (0000 ms) DiContainer.Bind

WindowsPlayer "_" SceneContext.Awake detailed profiling: Total time tracked: 2203,91 ms.  Details:
  68,4% (52537x) (1507 ms) User Code
  21,5% (26274x) (0475 ms) Type Analysis - Direct Reflection
  03,8% (26259x) (0084 ms) DiContainer.Instantiate
  03,0% (51319x) (0067 ms) DiContainer.Resolve
  02,3% (26274x) (0051 ms) Type Analysis - Calling Baked Reflection Getter
  00,6% (00001x) (0014 ms) Other
  00,3% (26266x) (0006 ms) DiContainer.Inject
  00,0% (00001x) (0000 ms) GameObject.Instantiate
  00,0% (00008x) (0000 ms) Searching Hierarchy
  00,0% (00011x) (0000 ms) DiContainer.Bind

WindowsPlayer "_" SceneContext.Awake detailed profiling: Total time tracked: 2205,35 ms.  Details:
  67,8% (52537x) (1495 ms) User Code
  22,0% (26274x) (0486 ms) Type Analysis - Direct Reflection
  03,8% (26259x) (0084 ms) DiContainer.Instantiate
  03,0% (51319x) (0066 ms) DiContainer.Resolve
  02,4% (26274x) (0053 ms) Type Analysis - Calling Baked Reflection Getter
  00,7% (00001x) (0014 ms) Other
  00,3% (26266x) (0006 ms) DiContainer.Inject
  00,0% (00001x) (0000 ms) GameObject.Instantiate
  00,0% (00008x) (0000 ms) Searching Hierarchy
  00,0% (00011x) (0000 ms) DiContainer.Bind        

It’s interesting that Zenject reports the user code as the bottleneck:

  83,2% (52537x) (4410 ms) User Code
  11,6% (26274x) (0618 ms) Type Analysis - Direct Reflection        

However, it is clearly seen that a dense hash map gives a performance boost in both categories. The “user code” can be inspected in my repo by checking the ClassGenerator.cs or by cloning the repo and generating the test code locally (I didn’t want to push over 26000 generated files). You can see that all it has is binding and empty classes that use the construction injection. So we can safely assume that the total time tracked is completely on the Zenject side.

The results are very clear: the dense hash map is more than twice as fast. Considering the PC specifications, imagine how slow it is on low-end mobile devices. While saving memory it might have a heavy performance hit on loading times, which are crucial in the competitive mobile markets nowadays. And I wouldn’t say that 26k types are a lot for a really big mobile game, so the case is very realistic. Advanced Zenject or VContainer users might say that we have the reflection baking which saves performance, therefore we can opt for a sparse hash map and better memory consumption. So let’s investigate that too.


sparse_hash_map vs dense_hash_map With Reflection Baking

Reflection baking
26250 classes and bindings 

IL2CPP_USE_SPARSEHASH 1
| Total Time (ms) |
|-----------------|
| 4430.24         |
| 4488.11         |
| 4473.64         |
|-----------------|

IL2CPP_USE_SPARSEHASH 0
| Total Time (ms) |
|-----------------|
| 1741.95         |
| 1751.57         |
| 1752.00         |
|-----------------|        

Detailed Results

Reflection baking
26250 classes and bindings 

IL2CPP_USE_SPARSEHASH 1
WindowsPlayer "_" SceneContext.Awake detailed profiling: Total time tracked: 4430,24 ms.  Details:
  93,1% (52538x) (4123 ms) User Code
  03,6% (26274x) (0159 ms) Type Analysis - Calling Baked Reflection Getter
  01,5% (26259x) (0068 ms) DiContainer.Instantiate
  01,2% (51319x) (0052 ms) DiContainer.Resolve
  00,4% (00001x) (0017 ms) Other
  00,2% (26266x) (0007 ms) DiContainer.Inject
  00,1% (00022x) (0003 ms) Type Analysis - Direct Reflection
  00,0% (00011x) (0000 ms) DiContainer.Bind
  00,0% (00009x) (0000 ms) Searching Hierarchy
  00,0% (00001x) (0000 ms) GameObject.Instantiate

WindowsPlayer "_" SceneContext.Awake detailed profiling: Total time tracked: 4473,64 ms.  Details:
  93,0% (52538x) (4161 ms) User Code
  03,6% (26274x) (0161 ms) Type Analysis - Calling Baked Reflection Getter
  01,5% (26259x) (0067 ms) DiContainer.Instantiate
  01,2% (51319x) (0054 ms) DiContainer.Resolve
  00,4% (00001x) (0019 ms) Other
  00,2% (26266x) (0007 ms) DiContainer.Inject
  00,1% (00022x) (0004 ms) Type Analysis - Direct Reflection
  00,0% (00011x) (0000 ms) DiContainer.Bind
  00,0% (00009x) (0000 ms) Searching Hierarchy
  00,0% (00001x) (0000 ms) GameObject.Instantiate
  
WindowsPlayer "_" SceneContext.Awake detailed profiling: Total time tracked: 4488,11 ms.  Details:
  93,1% (52538x) (4180 ms) User Code
  03,5% (26274x) (0157 ms) Type Analysis - Calling Baked Reflection Getter
  01,5% (26259x) (0068 ms) DiContainer.Instantiate
  01,2% (51319x) (0055 ms) DiContainer.Resolve
  00,4% (00001x) (0017 ms) Other
  00,2% (26266x) (0007 ms) DiContainer.Inject
  00,1% (00022x) (0003 ms) Type Analysis - Direct Reflection
  00,0% (00009x) (0000 ms) Searching Hierarchy
  00,0% (00011x) (0000 ms) DiContainer.Bind
  00,0% (00001x) (0000 ms) GameObject.Instantiate



IL2CPP_USE_SPARSEHASH 0
WindowsPlayer "_" SceneContext.Awake detailed profiling: Total time tracked: 1741,95 ms.  Details:
  80,8% (52538x) (1408 ms) User Code
  10,1% (26274x) (0175 ms) Type Analysis - Calling Baked Reflection Getter
  03,9% (26259x) (0068 ms) DiContainer.Instantiate
  03,8% (51319x) (0067 ms) DiContainer.Resolve
  00,9% (00001x) (0015 ms) Other
  00,3% (26266x) (0006 ms) DiContainer.Inject
  00,2% (00022x) (0003 ms) Type Analysis - Direct Reflection
  00,0% (00001x) (0000 ms) GameObject.Instantiate
  00,0% (00011x) (0000 ms) DiContainer.Bind
  00,0% (00009x) (0000 ms) Searching Hierarchy

Autoconnected Player "Autoconnected Player" SceneContext.Awake detailed profiling: Total time tracked: 1751,57 ms.  Details:
  81,6% (52538x) (1429 ms) User Code
  09,9% (26274x) (0173 ms) Type Analysis - Calling Baked Reflection Getter
  03,8% (26259x) (0066 ms) DiContainer.Instantiate
  03,4% (51319x) (0059 ms) DiContainer.Resolve
  00,8% (00001x) (0015 ms) Other
  00,3% (26266x) (0006 ms) DiContainer.Inject
  00,2% (00022x) (0003 ms) Type Analysis - Direct Reflection
  00,0% (00011x) (0000 ms) DiContainer.Bind
  00,0% (00009x) (0000 ms) Searching Hierarchy
  00,0% (00001x) (0000 ms) GameObject.Instantiate

WindowsPlayer "_" SceneContext.Awake detailed profiling: Total time tracked: 1752,00 ms.  Details:
  81,7% (52538x) (1432 ms) User Code
  09,7% (26274x) (0170 ms) Type Analysis - Calling Baked Reflection Getter
  03,8% (26259x) (0066 ms) DiContainer.Instantiate
  03,5% (51319x) (0061 ms) DiContainer.Resolve
  00,8% (00001x) (0014 ms) Other
  00,3% (26266x) (0006 ms) DiContainer.Inject
  00,2% (00022x) (0003 ms) Type Analysis - Direct Reflection
  00,0% (00009x) (0000 ms) Searching Hierarchy
  00,0% (00011x) (0000 ms) DiContainer.Bind
  00,0% (00001x) (0000 ms) GameObject.Instantiate        

With the reflection baking, there is Type Analysis - Calling Baked Reflection Getter instead of Direct Reflection and it’s indeed faster, but the dense hash map still performs a lot faster, because there are a lot of other operations on the hash map apart from reflection which might tank the performance.


Conclusion

It was a great investigation and I was surprised that searching IL2CPP_USE_SPARSEHASH online had absolutely zero references: no info in Unity docs, no posts, no videos. I am happy that I have such valuable info to share and now you have one more instrument to leverage performance in your game, ultimately leading to better experience for your players and better games overall. All the tests I shared in this post are made on PC, but on mobile, especially on low-end devices the benefit in loading times is also significant with the dense hash map. So give it a try. There are other techniques for improving dependency resolution times in DI frameworks, but this one is super easy to implement and test. I would love to hear in the comments about your games and how this advice helped you to improve your performance.

Key Takeaways

  • Setting IL2CPP_USE_SPARSEHASH 0 in "Unity installation directory"\Editor\Data\il2cpp\libil2cpp\il2cpp-config.h gives a significant boost in performance for particular cases like DI framework dependencies resolution and other cases that rely on metadata in generated C++ projects via IL2CPP.
  • Don’t trust anyone, including me. Profile your particular case, and make sound decisions about configuring your game, especially at such an internal level, based on your data and not information from the internet. I’ve shared the tool with you, but you must do the profiling yourself.

Subscribe for more

Telegram

GitHub

要查看或添加评论,请登录

Alexey Merzlikin的更多文章