Modern Memory Allocation Best Practices in .NET 9 and .NET 10

David Shergilashvili

Enterprise Architect & Software Engineering Leader | Cloud-Native, AI/ML & DevOps Expert | Driving Blockchain & Emerging Tech Innovation | Future CTO

发布日期: 2025年3月31日

Efficient memory allocation is at the heart of high-performance .NET applications. With the release of .NET 9 and the upcoming .NET 10, the runtime has introduced significant improvements to manage memory more effectively. This article explores modern best-practice approaches to memory allocation in .NET, focusing on how .NET 9/10 address classic allocation challenges like central allocator bottlenecks, arena contention, memory fragmentation spirals, thread-local cache pollution, and cross-thread deallocations. We'll dive into each of these issues – explaining their root causes and how the latest .NET runtime mitigates them – and provide practical examples, tools, and tips that .NET developers can apply in real-world projects.

Throughout this article, you'll find code examples that demonstrate proven techniques for efficient memory management in modern .NET applications. These examples represent best practices for avoiding common memory allocation pitfalls and leveraging the performance benefits of the latest .NET runtime improvements.

The Challenges of Memory Allocators (TCMalloc, Jemalloc, etc.)

Before examining .NET specifics, it's useful to understand problems identified in native memory allocators (like TCMalloc and Jemalloc) that inspired many modern improvements:

Central Cache Bottlenecks: Traditional allocators often use a global free-memory pool. Under concurrency, this central cache can become a hot lock or bottleneck if every thread must frequently access it. TCMalloc (Thread-Caching Malloc) addressed this by giving each thread a private cache and only occasionally syncing with a central list. This greatly reduces lock contention, but any time threads do interact with the central free list (e.g. grabbing a new batch of memory or releasing a batch back), there is still a lock per size class. In extreme allocation patterns, even those fine-grained locks can see contention or latency spikes if many threads simultaneously hit the central allocator.
Multiple Arena Contention: Jemalloc and some libc allocators use multiple arenas (heaps) to avoid one global lock. For example, on Linux the default malloc can create up to 8 arenas per core to service threads. The idea is that threads allocate from their own arena, reducing cross-thread locking. This improves throughput but comes at a cost: memory is partitioned into many independent arenas that cannot share free space. If one arena is holding free memory that another thread could use, that memory sits idle because allocators don't migrate free blocks between arenas. In practice, multiple arenas can lead to significant memory fragmentation, with overall memory usage blowing up 2–3× due to unusable gaps.
Memory Fragmentation Spirals: Fragmentation occurs when freed memory is not usable for new allocations, typically because of size or ordering mismatches. Over time, small "holes" of free memory scatter between live blocks, and the allocator may be forced to request more OS memory even though plenty is free in aggregate. This can spiral: as the process grows, fragmentation creates more waste, leading to even more allocations. For instance, if a 1 KB object remains allocated between two freed blocks of 40 KB and 100 KB, the 140 KB combined free space can't be returned to the OS until that 1 KB is freed. The allocator will instead request new memory for the next 100 KB allocation, leaving the free chunks in a fragmented state. Over time, such fragmentation bloats memory usage and can hurt performance (more paging and cache misses).
Thread Cache Pollution: Modern allocators with thread-local caches (like TCMalloc and Jemalloc) introduce another quirk: when memory allocated by one thread is freed by a different thread, the freed block often ends up in the cache of the freeing thread (since that thread performs the free). If many threads free each other's allocations, each thread's local cache can fill with blocks it never allocated itself – effectively "polluting" caches with cross-thread memory. This bloats memory because those caches grow, and memory sits unused in the wrong place. A real example was observed with jemalloc: one thread pool allocated objects while a different pool freed them, and each freeing thread ended up holding ~2.5 MB in its cache, multiplying overall memory usage dramatically. Unless caches are manually trimmed or flushed, cross-thread deallocations can cause significant memory growth.
Cross-Thread Deallocations Overhead: In addition to memory bloat, freeing memory on a different thread than it was allocated can incur synchronization overhead. Without special handling, the allocator might need to acquire locks or perform atomic operations to safely free or transfer the memory. Optimized allocators often handle this by either transferring ownership of the freed block to a central structure or to the current thread's cache (as we saw above). Either way, there's a potential performance hit. The freeing thread might need to lock a global list (in a simple allocator), or it might stash the block in its cache (as in jemalloc), which avoids a lock but then ties up memory. Some custom allocators (e.g. rpmalloc, mimalloc) mitigate this by designing cross-thread frees to be lock-free and by periodically returning memory to a global pool or the origin thread. It's a complex dance to make cross-thread free both fast and memory-efficient.

With these concepts in mind, let's see how the .NET runtime – which has its own managed memory manager (the Garbage Collector) and also interacts with unmanaged allocators – tackles these issues in .NET 9 and .NET 10.

TCMalloc-Style Central Cache vs .NET's Allocator

TCMalloc's central cache issue: The central free lists in TCMalloc ensure no single global lock serializes all allocations (each size class has its own lock). However, in scenarios with dozens of threads churning allocations of the same size, even those per-size-class locks can see contention. An allocator's performance can dip if threads frequently empty or refill their local caches (causing frequent access to the locked central list). The goal is to service most alloc/free operations from thread-local memory to stay off the locks.

.NET's approach: .NET's managed heap takes a different approach that naturally avoids a lot of central-locking overhead. Small object allocations in .NET are done via a thread-local allocation buffer (TLAB) – each thread reserves a chunk of the heap and bumps a pointer to allocate new objects, completely lock-free in the common case. Only when a thread's local heap segment is exhausted does it need to get a new segment or trigger a GC, which involves coordination. This design is conceptually similar to thread-caching allocators but even faster for allocation because there's no per-object free operation (garbage collection handles reclamation in batches). .NET's generational GC means that freeing memory doesn't involve per-allocation locking at all – dead objects are identified en masse during GC, and the heap is compacted by moving live objects, rather than maintaining intricate free lists for small objects.

Mitigating central contention: In .NET 9, the GC was further optimized to handle allocation bursts with minimal contention. For example, .NET 9 server GC can allocate on multiple threads in parallel across multiple heap segments, so threads don't all stomp on one global heap. Even in workstation GC (one heap), the runtime uses atomic operations to extend the heap when needed, keeping lock usage minimal. The .NET team continuously refines these pathways – in .NET 9, they tuned the GC for high-memory and high-allocating environments so that it scales better without becoming a bottleneck. The result is that even under heavy multi-threaded allocation loads, .NET 9 experiences fewer pauses and less contention than earlier versions.

In cases where your .NET application does a lot of unmanaged allocations (e.g. via Marshal.AllocHGlobal or native interop), you could still face central allocator bottlenecks from the OS allocator. On Windows, the default heap is already pretty good with the Low Fragmentation Heap (LFH) for small blocks. On Linux, if you see contention in malloc, one trick is to use an alternative allocator like jemalloc or tcmalloc by loading it at runtime (e.g. using LD_PRELOAD for your process). However, for pure managed allocations, the .NET GC's thread-local design means you seldom need such measures. The key best practice is to upgrade to .NET 9+ – which by default gives you the latest allocator optimizations – and to use Server GC mode for high-throughput server applications. Server GC creates multiple heaps to reduce contention (more on that next), whereas Workstation GC uses one heap for all threads.

Jemalloc Arena Contention vs .NET Server GC Heaps

Arena contention and fragmentation: Jemalloc's strategy of multiple arenas per process improves concurrency by reducing lock contention – threads are less likely to block each other when allocating. The trade-off, as noted, is potentially higher memory usage because free memory is siloed per arena. If one arena has a lot of free space and another runs out, the second cannot utilize the first's free space. All unused memory in each arena stays "available" but idle. In a long-running service with many threads, you can end up with dozens of arenas, each holding onto some free chunks that can't be reclaimed or reused by others, resulting in overall fragmentation. Studies have shown this can lead to 2–3× higher memory consumption for the process. Reducing the number of arenas (e.g. tuning MALLOC_ARENA_MAX in glibc) can improve memory reuse but then contention goes up – it's a classic throughput vs. memory trade-off.

.NET's approach with Server GC: The .NET garbage collector in Server GC mode also employs multiple heaps (one per logical CPU by default) to parallelize work and reduce contention. Each GC heap in server mode has its own allocation pointer and its own synchronization, so threads on different processors allocate in separate regions of memory. This is somewhat analogous to multiple arenas. The benefit is excellent throughput on multi-core machines – .NET can allocate on all threads concurrently, and during GC each heap is collected by a dedicated GC thread in parallel. The downside is that, like arenas, having more heaps can increase memory usage. The GC tries to balance load across heaps, but you might see higher memory footprint on a 32-core machine than on a 4-core machine for the same workload, simply because 32 heaps can collectively "hoard" more memory (each heap might keep some reserve). As Maoni Stephens (Microsoft's GC expert) described, server GC's memory use can scale with processor count, and it can be unpredictable – run the same app on a machine with more cores, and the GC might use a larger heap because it can spread out more. Each heap holds some long-lived objects, and each has its own free space that isn't immediately available to other heaps.

Improvements in .NET 9/10: .NET 9 introduced tuning to better handle large heaps and reduce fragmentation in high-allocation workloads. Moreover, an experimental feature from .NET 8, called Dynamic Adaptation to Application Sizes (DATAS), is likely to play a bigger role going forward. DATAS can adjust the number of GC heaps on the fly based on workload and environment. For example, if you have a small app running in a container with many CPU cores, the GC might decide not to use a heap per core (which would be overkill), thereby saving memory. Conversely, under heavy load it can scale up the heap count. This dynamic tuning helps avoid the scenario of too many heaps bloating memory or too few heaps causing contention. By .NET 10, we expect the GC to leverage these heuristics even more – making server GC smarter about balancing memory vs. throughput automatically.

Best practices: For .NET developers, the key is to use Server GC for server applications (ASP.NET Core, background services, etc.) to get the concurrency benefits. This is usually enabled by default in ASP.NET Core (or you can opt in via runtimeconfig or an environment variable). If you run into memory constraints (e.g. in containers), keep an eye on .NET runtime updates – features like DATAS (opt-in via COMPlus_GCHeapCount or other GC config settings in .NET 8) can help the runtime use fewer heaps for small apps. Always measure: use performance counters or dotnet-counters to monitor GC heap size and check if memory usage scales linearly with cores.

Taming Memory Fragmentation Spirals in .NET

Fragmentation in .NET: Memory fragmentation is not just a native code problem – managed heaps can fragment too, especially the Large Object Heap (LOH). In .NET, objects larger than 85 KB are allocated on the LOH, which historically does not get compacted by the GC by default (because moving very large objects can be expensive). This means if you allocate and free large objects of varying sizes, the LOH can fill up with "holes" of freed space that are not usable for new large objects unless an equal-or-smaller object comes along. Over time, the process working set grows. This is very similar to native fragmentation issues. For example, if your app frequently allocates a 1 MB object, frees it, and allocates a 1.1 MB object, that 1 MB free space might not be reused (since it's slightly smaller than needed), and the runtime will allocate a fresh 1.1 MB from the OS. Patterns like this cause LOH fragmentation "spirals" where memory usage keeps climbing despite a lot of free space in theory. Large arrays are a common culprit – e.g. reading large JSON payloads into byte arrays or creating big image buffers can fragment the LOH if not managed.

Preventing LOH Fragmentation with Buffer Pooling:

using System.Buffers;

public class DataProcessor
{
    private readonly ArrayPool<byte> _pool = ArrayPool<byte>.Shared;

    public void Process(Stream input)
    {
        byte[] buffer = _pool.Rent(1024 * 512); // 512 KB buffer
        try
        {
            input.Read(buffer, 0, buffer.Length);
            // Process the buffer
        }
        finally
        {
            _pool.Return(buffer);
        }
    }
}

This approach rents a buffer from a shared pool, avoiding repeated large allocations that could fragment the LOH. The ArrayPool<T> class intelligently manages a set of arrays that can be reused across your application.

Triggering LOH Compaction When Needed:

using System;
using System.Runtime;

public static class GCUtils
{
    public static void CompactLOH()
    {
        GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
        GC.Collect();
    }
}

This utility method can be called at safe application points (such as during low-traffic periods or scheduled maintenance windows) when you need to compact the LOH to reduce fragmentation.

.NET 9 improvements: Recognizing these challenges, .NET 9 made compaction smarter and more efficient. The GC now has an improved compaction algorithm that can reduce fragmentation with less overhead, meaning the runtime is more willing to compact memory when beneficial. In scenarios with large heaps, .NET 9 also showed better behavior – fewer pauses and more consistent memory usage even when handling tons of large objects. Additionally, .NET has provided a way to compact the LOH on demand for a while: you can call GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce and then trigger a GC, which forces a one-time LOH compaction. This can be useful at a safe point (like app startup or a maintenance window) if your LOH fragmentation is severe. .NET 9's enhancements mean you might need to do this less often, as the runtime itself tries to minimize LOH fragmentation.

Using Span<T> and Avoiding Allocations: Another key strategy to avoid memory fragmentation is to minimize allocations altogether, especially for short-lived operations. Modern .NET provides the Span<T> type which allows you to work with memory without making copies:

public void FastParse()
{
    Span<byte> buffer = stackalloc byte[128]; // Allocated on stack, not heap
    // fill and use buffer — no GC allocation happens
}

For small buffers, stackalloc can be used to allocate memory on the stack instead of the heap, completely avoiding GC overhead. This is particularly useful for parsing operations and other short-lived data transformations.

Efficient JSON Processing with Span<T> and Utf8JsonReader:

using System;
using System.Buffers;
using System.Text.Json;

public class JsonParser
{
    public void Parse(ReadOnlySpan<byte> jsonUtf8)
    {
        var reader = new Utf8JsonReader(jsonUtf8);
        while (reader.Read())
        {
            if (reader.TokenType == JsonTokenType.PropertyName)
            {
                ReadOnlySpan<byte> propertyName = reader.ValueSpan;
                // handle property directly from the span - no string allocations!
            }
        }
    }
}

This example shows how to parse JSON directly from UTF-8 bytes without allocating strings for property names or values. The Utf8JsonReader works with spans, allowing zero-copy parsing of JSON data.

Pinned object heap (POH): Another source of fragmentation in .NET was pinned objects. Pinning an object (via fixed or P/Invoke) prevents the GC from moving it, so pinned objects could create "islands" of immovable memory that forced the GC to skip compaction around them. .NET 5 introduced the Pinned Object Heap to segregate long-lived pinned objects from the rest of the heap. Now, when you pin memory (like using GCHandle.Alloc(obj, GCHandleType.Pinned) or when using fixed on an array), the runtime can allocate that object in the POH (if it's a long-lived allocation) so that it doesn't intermix with normal objects. This isolation means the main heaps can compact more effectively without worrying about pinned blocks, thereby reducing fragmentation. The POH trade-off is similar to arenas: it's a separate space so it can reduce fragmentation impact on the main heap, but within the POH itself, objects are not moved (by design), so if you pin a lot and free them, the POH can fragment. The guidance is to pin only when necessary and keep pinned objects around only as long as needed.

Real-world example: Consider a data processing service that builds large byte arrays from streams. If it naively does byte[] buffer = new byte[500_000] for each request and then lets it go, the LOH will get cluttered with freed 500 KB blocks. Over time, those blocks may not be reusable if new requests need slightly different sizes. The process's memory will grow and you may even see Gen2 collections not freeing much memory because it's all fragmentation. Upgrading to .NET 9 helps by reducing some fragmentation automatically. But the best solution is to apply buffer pooling: reuse large byte arrays instead of constantly allocating new ones. .NET's ArrayPool<T> is a great tool here. For example, you can rent a 1 MB buffer once and use slices of it for those 500 KB chunks, or rent exactly the size needed and return it to the pool after use. This way, the memory gets reused and the LOH doesn't fragment as much. In fact, using pooling in .NET 8/9 along with the improved GC can practically eliminate fragmentation issues for many scenarios.

.NET 10 outlook: .NET 10 (the next LTS) is expected to further mitigate fragmentation by possibly making LOH compaction more automatic. It wouldn't be surprising if .NET 10's GC heuristics trigger a compaction of the LOH when fragmentation reaches a certain threshold, all behind the scenes. .NET 10 will also benefit from the continuing evolution of DATAS – meaning after a surge of allocations, the GC might proactively trim the heap size down (performing a compacting GC) when the memory pressure subsides, to avoid stranded free memory lingering indefinitely. From a developer perspective, the runtime is becoming smarter, but you should still design with fragmentation in mind: prefer streaming and chunking large data, reuse buffers, and avoid creating giant objects that live just briefly if you can.

Thread Cache Pollution & Cross-Thread Freeing in .NET

As described earlier, thread cache pollution occurs in native allocators when one thread frees memory that another thread allocated, leading to that memory being held in the wrong thread's cache. In unmanaged scenarios, this can cause surprising memory growth. How does this translate to .NET's world?

Managed memory case: In pure managed code, developers don't manually free objects – the GC handles it. That means we don't have an exact analog of "cross-thread free" for managed objects. If one thread creates an object and another thread drops the last reference to it, the GC will eventually reclaim it on a background thread (or one of the GC threads). The memory will go back to the managed heap's free space. There is no concept of per-thread object caches that hold freed objects for reuse – the GC either compacts them away or leaves holes to fill with future allocations. So, by design, .NET avoids thread cache pollution for managed objects; everything freed goes back to a common pool (the heap) rather than staying owned by a specific thread. This is one of the strengths of a garbage-collected system: deallocation is centralized and optimized in bulk, rather than per free call.

Custom Object Pooling for Reuse: When you need finer control over object reuse, you might implement custom pooling. Here's a simple implementation:

public class SimplePool<T> where T : new()
{
    private readonly ConcurrentBag<T> _items = new();

    public T Rent() => _items.TryTake(out var item) ? item : new T();
    public void Return(T item) => _items.Add(item);
}

This pattern is particularly useful for short-lived but expensive-to-create objects like buffers, network connections, or complex data structures. By recycling objects across threads, you can significantly reduce allocation pressure.

However, this comes at the cost of needing periodic garbage collection pauses and some CPU overhead to do that work. .NET has made those pauses very short in recent versions (with background GC and many optimizations), but it's the trade-off for avoiding per-allocation free costs. The GC approach also means .NET can avoid memory leaks due to cache misses – e.g., in jemalloc if you forget to flush thread caches, memory might sit unused; in .NET, when the GC runs, it knows exactly what's garbage and can reclaim it globally.

Unmanaged & interop scenarios: If your .NET application uses native memory (for example, using unmanaged arrays via Marshal.AllocHGlobal, or calling into a native library that uses its own allocator), then thread cache issues can bite you. Suppose you allocate native memory in C# on one thread and free it on another via Marshal.FreeHGlobal – under the hood, that goes to the OS allocator (on Windows, HeapFree; on Linux, free()). The behavior then depends on the allocator: on Windows, the Heap Manager might put that freed block into a per-heap cache (Windows heaps have LFH which is per-heap, not per-thread, so cross-thread free usually just returns it to the same heap's list – typically fine). On Linux glibc, a free might go back to that thread's arena or be placed in a list that another thread can steal from; it's complex, but significant cross-thread frees can still lead to suboptimal reuse.

If you suspect such issues (e.g., you see memory growing when threads other than the allocating one free memory), you have a few options:

Use a custom allocator in native code that is designed for cross-thread freeing. Some allocators like mimalloc and rpmalloc explicitly handle cross-thread frees by quickly transferring ownership or using lock-free lists. You could P/Invoke into such an allocator for your specific use case.
Minimize cross-thread operations: where possible, free memory on the same thread that allocated it. This isn't always feasible (in a thread pool, work moves around), but if you have a choice, it can help the underlying allocator keep caches warm on the right thread.
Periodically flush caches if using an allocator like jemalloc. Jemalloc provides mallctl options to flush per-thread caches manually. While you can't easily call that from C# for the runtime's own allocs, if you embedded jemalloc for a specific purpose, you could manage it.

Example – Object pooling and thread handoff: A common .NET pattern is using a shared object pool (e.g., System.Buffers.ObjectPool<T> or a custom pool) to reuse objects. Often, threads will rent an object from the pool and another thread might return it. .NET's pools are implemented carefully to handle thread contention (usually with locks or interlocked operations). There isn't a built-in thread-local caching in ObjectPool (the default ObjectPool<T> in .NET Core 6+ uses a lock-free Treiber stack under the hood). A ConcurrentBag is sometimes used as a pool – it does have thread-local storage of items. A gotcha there is if a thread that has cached items in a ConcurrentBag dies, those items are not immediately visible to other threads until some scavenging occurs. This is somewhat analogous to cache pollution (one thread "owns" items that others can't use). The bag will eventually make them available, but heavy churn of threads can lead to delays. The recommendation is to use the pooling classes from System.Buffers or the TPL Dataflow BufferBlock, as they are designed with such scenarios in mind.

How .NET 9/10 help: The improvements in .NET 9 around high allocation rates indirectly help cross-thread situations as well. For instance, the garbage collector has been tuned to reduce contention on the UOH (Unmovable Object Heap), which includes LOH and POH. In .NET 8/9, when a thread allocates a large object, it must take a lock (per-heap lock for UOH). If many threads are allocating large objects, they contend on that lock. .NET 9's GC tuning reduced such contention by possibly using finer-grained locks or speeding up the allocation path. Less time holding the lock means less chance two threads clash. Also, if one thread is allocating and another triggering a GC, .NET 9 and .NET 10 aim to make those more concurrent (e.g., background GC can reclaim objects while allocations still happen in Gen0).

For cross-thread free of managed objects (which is essentially normal GC), .NET 9's smarter collection prioritization means the GC is better at deciding when to collect, potentially preventing a situation where one thread keeps allocating (increasing heap size) while another thread's objects are awaiting collection. The GC will intervene at an optimal time to collect garbage and satisfy allocations without letting the heap spiral out of control.

Bottom line: .NET's memory manager inherently avoids many cross-thread allocation pitfalls by centralizing garbage collection. For most .NET developers, this means you don't need to worry about who frees what – just allocate responsibly and let the GC do its job. If you use unsafe or unmanaged memory, stick to consistent allocation/free patterns and consider using .NET's SafeHandle or Memory<byte> techniques to manage lifetimes more explicitly (these can ensure frees happen appropriately). And as always, if you suspect an issue, use profiling tools (like Performance profilers or event tracing) to see where memory is being allocated and freed.

Best Practices and Tools for .NET Memory Management

Modern .NET (Core) is very fast, but developers can take additional steps to ensure memory usage is efficient and avoid known pitfalls. Here are some best practices and tools for .NET 9 and 10:

Prefer the Latest Runtimes: Upgrade to .NET 9 or .NET 10 to benefit from GC improvements that reduce fragmentation and pauses. Each version has optimized the GC algorithms, so you get a boost without any code changes.
Use Server GC for High-Throughput Apps: Server GC (enabled by config or by default in ASP.NET Core) creates multiple heaps and parallel GC threads, which dramatically improve allocation throughput on multi-core machines. Just be aware of memory usage – if your service is small but running on a machine with many cores, consider limiting GC heaps (via GCHeapCount or using Workstation GC) to avoid over-provisioning memory. .NET 10 will likely do this for you automatically in many cases.
Pool and Reuse Objects: The garbage collector is fast, but the best allocation is the one you avoid. Use object pooling for frequently used large objects or buffers. For example, use ArrayPool<T>.Shared to rent and return large arrays instead of allocating new ones every time. This prevents LOH fragmentation by recycling memory blocks (the ArrayPool is designed to reduce fragmentation). Real-world usage: Kestrel and SignalR in ASP.NET use pooling internally to minimize allocations. Pooling is especially important for things like buffers, string builders, and other large structures used in high volume.
Leverage Span<T> and Memory<T>: These modern value types allow you to work with slices of memory without allocating new arrays or strings. For example, if parsing data, use Span<char> to refer to a segment of a char array or ReadOnlySpan<byte> to parse a buffer – you avoid creating substrings or subarrays. Less allocation means less GC work and less fragmentation. Span<T> can even refer to stack memory via stackalloc for small temporary buffers (which then completely bypass the GC heap).
Avoid Long-Lived Pinning: If you pin memory (with fixed or GCHandles), try to pin for short durations. Long-lived pinned objects should ideally go on the POH (which .NET does automatically for some cases) to isolate their impact. If you find yourself needing to pin large buffers often (e.g. for interop with native APIs), consider using the POH or an unmanaged memory pool. .NET offers MemoryManager<T> and MemoryPool<T> abstractions that can be backed by native memory to handle such scenarios.
Profile Memory Usage: Use tools like Visual Studio Diagnostics (Memory Profiler), JetBrains dotMemory, or PerfView to understand your app's allocation patterns. PerfView, in particular, can show GC heap dumps and LOH usage over time. Look for signs of fragmentation: lots of free memory in the heap that's not being reused, or increasing committed memory even when live object sizes plateau. In .NET 9+, you can also use GC.GetGCMemoryInfo() at runtime to get statistics on heap size and fragmentation. If you see that Gen2 or LOH size is growing much larger than the actual live data, that's a hint of fragmentation.
Tune GC for Extreme Scenarios: .NET's default settings are good for most apps. But if you have a very latency-sensitive application (e.g., a trading system), look into GC latency modes. For example, GC.TryStartNoGCRegion can be used to prevent GC during a critical section (you allocate a fixed buffer in advance). After the critical work, you end the no-GC region and garbage collect. This is advanced and needs careful planning, but it's an option. Another scenario: if you have periodic heavy allocations, you might manually call GC.Collect() at a known safe time (say, after a big batch job) to force reclamation and compact the LOH. Be cautious with manual GC calls – they can often do more harm than good if misused – but in some cases they can help stabilize memory usage.
Consider Unmanaged Memory for Specific Cases: If managed memory fragmentation is unavoidable for a certain workload, one strategy is to offload that to unmanaged memory and manage it yourself (or via a native allocator). For example, if you maintain a large lookup table that rarely changes, allocating it in unmanaged memory (using Marshal.AllocHGlobal or memory-mapped files) means the GC doesn't see it at all, easing GC pressure. Paired with this, use an allocator like jemalloc (you can P/Invoke into jemalloc or use libraries like jemalloc.NET which wrap it) for better control. Unmanaged memory gives you full control to allocate and free, possibly defragmenting manually, but remember you lose the safety of GC – so weigh this decision heavily. The upcoming .NET 10's improvements might make this unnecessary in many cases, as the managed heap becomes even more robust for large workloads.
Monitor in Production: Use runtime metrics in production to catch memory issues early. .NET exposes many counters (through EventCounters, /metrics endpoint, or tools like dotnet-counters) such as GC heap size, allocations/sec, and fragmentation metrics. By watching these, you can tell if memory usage is trending upward (possible leak or fragmentation). Also, consider using dump analysis for memory leaks – tools like dotnet-gcdump and dotnet-dump can capture heap dumps from a live process to analyze offline.

Real-World Scenario: Applying These Strategies

Imagine a .NET microservice processing live sensor data, where each message can be up to 200 KB of JSON. An initial implementation deserializes the JSON using JsonDocument.Parse on a string – which allocates a large string for the JSON text and lots of small objects for tokens. Under load, this results in heavy Gen0 allocations and frequent LOH usage for the large string. The symptom is high CPU in GC and steadily increasing memory (fragmentation) after hours of run-time.

By adopting .NET 9 and following these best practices, we can significantly improve this service:

Upgrade to .NET 9 – immediately benefit from the GC enhancements that reduce large heap fragmentation and GC pause times.
Use Server GC in the service's Docker container to leverage all CPU cores for parallel GC, ensuring throughput scales with multi-core servers.
Refactor JSON processing to use System.Text.Json with Utf8JsonReader on a pooled byte[]. Instead of reading the whole payload into a string, rent a buffer from ArrayPool<byte> and read the network stream into that buffer. Parse directly from UTF8 bytes (which avoids the large string allocation altogether). Once done, return the buffer to the pool. This change eliminates large object allocations for JSON and drastically reduces Gen2 pressure.
Use Span and Memory throughout the parsing logic so that slicing and dicing the data doesn't create new allocations. Any small temporary buffers needed (for e.g. constructing a response) are allocated on the stack or rented if larger.
Run a quick load test and use dotnet-counters to observe metrics. We see Gen0/Gen1 collections happening (small objects from parsing) but Gen2 size remains stable. No more unchecked growth – memory plateaus because the pool reuses buffers and .NET 9's GC efficiently cleans up the small transient objects. CPU usage is also more stable because we eliminated many allocations and .NET's background GC handles the rest smoothly.

// Original implementation with memory issues
public SensorData ProcessSensorMessage(Stream messageStream)
{
    // Read entire message into a string - allocates large string on heap
    using var reader = new StreamReader(messageStream);
    string jsonText = reader.ReadToEnd(); // Potentially large allocation
    
    // Parse JSON - creates many small objects
    using var document = JsonDocument.Parse(jsonText);
    var root = document.RootElement;
    
    // Extract and process data
    return ExtractSensorData(root);
}

// Optimized implementation with pooling and Span<T>
public SensorData ProcessSensorMessageOptimized(Stream messageStream)
{
    // Get buffer from pool instead of allocating new
    byte[] buffer = ArrayPool<byte>.Shared.Rent(1024 * 256); // 256KB buffer
    try
    {
        int bytesRead = 0;
        int read;
        // Read directly into buffer
        while ((read = messageStream.Read(buffer, bytesRead, buffer.Length - bytesRead)) > 0)
        {
            bytesRead += read;
            if (bytesRead == buffer.Length)
            {
                // Buffer too small, get a larger one
                byte[] newBuffer = ArrayPool<byte>.Shared.Rent(buffer.Length * 2);
                Buffer.BlockCopy(buffer, 0, newBuffer, 0, bytesRead);
                ArrayPool<byte>.Shared.Return(buffer);
                buffer = newBuffer;
            }
        }
        
        // Parse JSON directly from buffer using Span
        ReadOnlySpan<byte> bufferSpan = new ReadOnlySpan<byte>(buffer, 0, bytesRead);
        return ProcessSensorJson(bufferSpan);
    }
    finally
    {
        ArrayPool<byte>.Shared.Return(buffer);
    }
}

private SensorData ProcessSensorJson(ReadOnlySpan<byte> jsonUtf8)
{
    // Process JSON without string allocations
    var reader = new Utf8JsonReader(jsonUtf8);
    var sensorData = new SensorData();
    
    // Use stackalloc for small temporary storage
    Span<byte> tempBuffer = stackalloc byte[128];
    
    // Parse JSON directly from binary - zero string allocations
    while (reader.Read())
    {
        // Process each token directly from Span
        // ...
    }
    
    return sensorData;
}

After these changes, the service handles higher throughput with lower memory footprint, and GC pauses (which could manifest as rare latency spikes) are shorter and less frequent. This is a concrete example of how understanding the runtime's memory allocation behavior and using modern .NET features together yield an outcome greater than the sum of parts: the runtime got faster and our code got smarter.

Conclusion

Memory allocation in .NET has evolved into a sophisticated, high-performance system in .NET 9 and .NET 10. The runtime team has tackled issues analogous to those in native allocators – from central cache locks to multi-heap fragmentation – and delivered improvements that make memory management more seamless for developers. By adopting the latest runtime and following best practices like pooling, span usage, and mindful GC tuning, .NET developers can build applications that handle memory efficiently even under stress. The result is apps that are both fast and stable: minimal garbage collection hiccups, controlled memory growth, and the confidence that comes from knowing the runtime is mitigating fragmentation and contention behind the scenes.

.NET User Group Tbilisi

10,921 位关注者

要查看或添加评论，请登录

David Shergilashvili的更多文章

???????????? ??????????? ???????????? – 2025 ???? ???????

2025年4月1日

???????????? ??????????? ???????????? – 2025 ???? ???????

???????????? IT ??????? ???? ?????? ?????????? ?????? ???????, ????? ????????? ??????? ???????? ????????????…
??????????? CI/CD ?????????? Windows ?????????? ?? ???????? ???????

2025年3月31日

??????????? CI/CD ?????????? Windows ?????????? ?? ???????? ???????

??????????? ??????? ??????????? ?? ????????? (CI/CD) ?????????? ?????????? ???? ???????? Windows ???????? ??????????…
????????? ?????????? ????????? ???????????? ???????? (2022-2027)

2025年3月31日

????????? ?????????? ????????? ???????????? ???????? (2022-2027)

?????????? ????????????? ??????? ??????? ??????????? ?????????? ????????. ???? ???????? ??????????, ????? ??????????…
Scalable and Reliable Database Architectural Approaches

2025年3月31日

Scalable and Reliable Database Architectural Approaches

Modern database systems require thoughtful architecture to handle growing data volumes and user loads. Here are four…
Long-Running Transactions in Banking Microservices

2025年3月31日

Long-Running Transactions in Banking Microservices

Long-running transactions can significantly impact database performance, especially in complex banking systems built on…
??????? ????????????? ???????????? – ????????? ??????????? ????????

2025年3月28日

??????? ????????????? ???????????? – ????????? ??????????? ????????

?????? ??????, ??? ??????????? ???????? ??????? IT ???????????? ?????, ??? ????? ???????? ?????????? ??????? ??????…
????????? ??????????, ?????? ???????? ?????????? ?????? ????

2025年3月25日

????????? ??????????, ?????? ???????? ?????????? ?????? ????

??????????? ????? ???????? ??????, ??? ???????? ????????? ??????? ????? ????????????. ?? ????? ??????? ??, ???????…

2 条评论
Microsoft-?? ??????: TypeScript ???????????? Go-?? ???????? (10-??? ??????)

2025年3月24日

Microsoft-?? ??????: TypeScript ???????????? Go-?? ???????? (10-??? ??????)

????????? ????????? ?? 10-?????? ?????????? ???????? Microsoft-?? ???????????, TypeScript-?? ????????????…

5 条评论
Key Caching Challenges in Modern Systems and How to Solve Them

2025年3月24日

Key Caching Challenges in Modern Systems and How to Solve Them

Caching is a fundamental technique in modern software systems to improve performance and handle scale. By storing…
Evolving Database Schemas in Banking Microservices Without Downtime

2025年3月24日

Evolving Database Schemas in Banking Microservices Without Downtime

Banking never sleeps. Whether it's ATM withdrawals at midnight, international transfers, or mobile banking at dawn…

See all articles

The Challenges of Memory Allocators (TCMalloc, Jemalloc, etc.)

TCMalloc-Style Central Cache vs .NET's Allocator

Jemalloc Arena Contention vs .NET Server GC Heaps

Taming Memory Fragmentation Spirals in .NET

Thread Cache Pollution & Cross-Thread Freeing in .NET

Best Practices and Tools for .NET Memory Management

Real-World Scenario: Applying These Strategies

Conclusion

.NET User Group Tbilisi

10,921 位关注者

David Shergilashvili的更多文章

???????????? ??????????? ???????????? – 2025 ???? ???????

??????????? CI/CD ?????????? Windows ?????????? ?? ???????? ???????

????????? ?????????? ????????? ???????????? ???????? (2022-2027)

Scalable and Reliable Database Architectural Approaches

Long-Running Transactions in Banking Microservices

??????? ????????????? ???????????? – ????????? ??????????? ????????

????????? ??????????, ?????? ???????? ?????????? ?????? ????

Microsoft-?? ??????: TypeScript ???????????? Go-?? ???????? (10-??? ??????)

Key Caching Challenges in Modern Systems and How to Solve Them

Evolving Database Schemas in Banking Microservices Without Downtime