Modern Memory Allocation Best Practices in .NET 9 and .NET 10
David Shergilashvili
Enterprise Architect & Software Engineering Leader | Cloud-Native, AI/ML & DevOps Expert | Driving Blockchain & Emerging Tech Innovation | Future CTO
Efficient memory allocation is at the heart of high-performance .NET applications. With the release of .NET 9 and the upcoming .NET 10, the runtime has introduced significant improvements to manage memory more effectively. This article explores modern best-practice approaches to memory allocation in .NET, focusing on how .NET 9/10 address classic allocation challenges like central allocator bottlenecks, arena contention, memory fragmentation spirals, thread-local cache pollution, and cross-thread deallocations. We'll dive into each of these issues – explaining their root causes and how the latest .NET runtime mitigates them – and provide practical examples, tools, and tips that .NET developers can apply in real-world projects.
Throughout this article, you'll find code examples that demonstrate proven techniques for efficient memory management in modern .NET applications. These examples represent best practices for avoiding common memory allocation pitfalls and leveraging the performance benefits of the latest .NET runtime improvements.
The Challenges of Memory Allocators (TCMalloc, Jemalloc, etc.)
Before examining .NET specifics, it's useful to understand problems identified in native memory allocators (like TCMalloc and Jemalloc) that inspired many modern improvements:
With these concepts in mind, let's see how the .NET runtime – which has its own managed memory manager (the Garbage Collector) and also interacts with unmanaged allocators – tackles these issues in .NET 9 and .NET 10.
TCMalloc-Style Central Cache vs .NET's Allocator
TCMalloc's central cache issue: The central free lists in TCMalloc ensure no single global lock serializes all allocations (each size class has its own lock). However, in scenarios with dozens of threads churning allocations of the same size, even those per-size-class locks can see contention. An allocator's performance can dip if threads frequently empty or refill their local caches (causing frequent access to the locked central list). The goal is to service most alloc/free operations from thread-local memory to stay off the locks.
.NET's approach: .NET's managed heap takes a different approach that naturally avoids a lot of central-locking overhead. Small object allocations in .NET are done via a thread-local allocation buffer (TLAB) – each thread reserves a chunk of the heap and bumps a pointer to allocate new objects, completely lock-free in the common case. Only when a thread's local heap segment is exhausted does it need to get a new segment or trigger a GC, which involves coordination. This design is conceptually similar to thread-caching allocators but even faster for allocation because there's no per-object free operation (garbage collection handles reclamation in batches). .NET's generational GC means that freeing memory doesn't involve per-allocation locking at all – dead objects are identified en masse during GC, and the heap is compacted by moving live objects, rather than maintaining intricate free lists for small objects.
Mitigating central contention: In .NET 9, the GC was further optimized to handle allocation bursts with minimal contention. For example, .NET 9 server GC can allocate on multiple threads in parallel across multiple heap segments, so threads don't all stomp on one global heap. Even in workstation GC (one heap), the runtime uses atomic operations to extend the heap when needed, keeping lock usage minimal. The .NET team continuously refines these pathways – in .NET 9, they tuned the GC for high-memory and high-allocating environments so that it scales better without becoming a bottleneck. The result is that even under heavy multi-threaded allocation loads, .NET 9 experiences fewer pauses and less contention than earlier versions.
In cases where your .NET application does a lot of unmanaged allocations (e.g. via Marshal.AllocHGlobal or native interop), you could still face central allocator bottlenecks from the OS allocator. On Windows, the default heap is already pretty good with the Low Fragmentation Heap (LFH) for small blocks. On Linux, if you see contention in malloc, one trick is to use an alternative allocator like jemalloc or tcmalloc by loading it at runtime (e.g. using LD_PRELOAD for your process). However, for pure managed allocations, the .NET GC's thread-local design means you seldom need such measures. The key best practice is to upgrade to .NET 9+ – which by default gives you the latest allocator optimizations – and to use Server GC mode for high-throughput server applications. Server GC creates multiple heaps to reduce contention (more on that next), whereas Workstation GC uses one heap for all threads.
Jemalloc Arena Contention vs .NET Server GC Heaps
Arena contention and fragmentation: Jemalloc's strategy of multiple arenas per process improves concurrency by reducing lock contention – threads are less likely to block each other when allocating. The trade-off, as noted, is potentially higher memory usage because free memory is siloed per arena. If one arena has a lot of free space and another runs out, the second cannot utilize the first's free space. All unused memory in each arena stays "available" but idle. In a long-running service with many threads, you can end up with dozens of arenas, each holding onto some free chunks that can't be reclaimed or reused by others, resulting in overall fragmentation. Studies have shown this can lead to 2–3× higher memory consumption for the process. Reducing the number of arenas (e.g. tuning MALLOC_ARENA_MAX in glibc) can improve memory reuse but then contention goes up – it's a classic throughput vs. memory trade-off.
.NET's approach with Server GC: The .NET garbage collector in Server GC mode also employs multiple heaps (one per logical CPU by default) to parallelize work and reduce contention. Each GC heap in server mode has its own allocation pointer and its own synchronization, so threads on different processors allocate in separate regions of memory. This is somewhat analogous to multiple arenas. The benefit is excellent throughput on multi-core machines – .NET can allocate on all threads concurrently, and during GC each heap is collected by a dedicated GC thread in parallel. The downside is that, like arenas, having more heaps can increase memory usage. The GC tries to balance load across heaps, but you might see higher memory footprint on a 32-core machine than on a 4-core machine for the same workload, simply because 32 heaps can collectively "hoard" more memory (each heap might keep some reserve). As Maoni Stephens (Microsoft's GC expert) described, server GC's memory use can scale with processor count, and it can be unpredictable – run the same app on a machine with more cores, and the GC might use a larger heap because it can spread out more. Each heap holds some long-lived objects, and each has its own free space that isn't immediately available to other heaps.
Improvements in .NET 9/10: .NET 9 introduced tuning to better handle large heaps and reduce fragmentation in high-allocation workloads. Moreover, an experimental feature from .NET 8, called Dynamic Adaptation to Application Sizes (DATAS), is likely to play a bigger role going forward. DATAS can adjust the number of GC heaps on the fly based on workload and environment. For example, if you have a small app running in a container with many CPU cores, the GC might decide not to use a heap per core (which would be overkill), thereby saving memory. Conversely, under heavy load it can scale up the heap count. This dynamic tuning helps avoid the scenario of too many heaps bloating memory or too few heaps causing contention. By .NET 10, we expect the GC to leverage these heuristics even more – making server GC smarter about balancing memory vs. throughput automatically.
Best practices: For .NET developers, the key is to use Server GC for server applications (ASP.NET Core, background services, etc.) to get the concurrency benefits. This is usually enabled by default in ASP.NET Core (or you can opt in via runtimeconfig or an environment variable). If you run into memory constraints (e.g. in containers), keep an eye on .NET runtime updates – features like DATAS (opt-in via COMPlus_GCHeapCount or other GC config settings in .NET 8) can help the runtime use fewer heaps for small apps. Always measure: use performance counters or dotnet-counters to monitor GC heap size and check if memory usage scales linearly with cores.
Taming Memory Fragmentation Spirals in .NET
Fragmentation in .NET: Memory fragmentation is not just a native code problem – managed heaps can fragment too, especially the Large Object Heap (LOH). In .NET, objects larger than 85 KB are allocated on the LOH, which historically does not get compacted by the GC by default (because moving very large objects can be expensive). This means if you allocate and free large objects of varying sizes, the LOH can fill up with "holes" of freed space that are not usable for new large objects unless an equal-or-smaller object comes along. Over time, the process working set grows. This is very similar to native fragmentation issues. For example, if your app frequently allocates a 1 MB object, frees it, and allocates a 1.1 MB object, that 1 MB free space might not be reused (since it's slightly smaller than needed), and the runtime will allocate a fresh 1.1 MB from the OS. Patterns like this cause LOH fragmentation "spirals" where memory usage keeps climbing despite a lot of free space in theory. Large arrays are a common culprit – e.g. reading large JSON payloads into byte arrays or creating big image buffers can fragment the LOH if not managed.
Preventing LOH Fragmentation with Buffer Pooling:
using System.Buffers;
public class DataProcessor
{
private readonly ArrayPool<byte> _pool = ArrayPool<byte>.Shared;
public void Process(Stream input)
{
byte[] buffer = _pool.Rent(1024 * 512); // 512 KB buffer
try
{
input.Read(buffer, 0, buffer.Length);
// Process the buffer
}
finally
{
_pool.Return(buffer);
}
}
}
This approach rents a buffer from a shared pool, avoiding repeated large allocations that could fragment the LOH. The ArrayPool<T> class intelligently manages a set of arrays that can be reused across your application.
Triggering LOH Compaction When Needed:
using System;
using System.Runtime;
public static class GCUtils
{
public static void CompactLOH()
{
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();
}
}
This utility method can be called at safe application points (such as during low-traffic periods or scheduled maintenance windows) when you need to compact the LOH to reduce fragmentation.
.NET 9 improvements: Recognizing these challenges, .NET 9 made compaction smarter and more efficient. The GC now has an improved compaction algorithm that can reduce fragmentation with less overhead, meaning the runtime is more willing to compact memory when beneficial. In scenarios with large heaps, .NET 9 also showed better behavior – fewer pauses and more consistent memory usage even when handling tons of large objects. Additionally, .NET has provided a way to compact the LOH on demand for a while: you can call GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce and then trigger a GC, which forces a one-time LOH compaction. This can be useful at a safe point (like app startup or a maintenance window) if your LOH fragmentation is severe. .NET 9's enhancements mean you might need to do this less often, as the runtime itself tries to minimize LOH fragmentation.
Using Span<T> and Avoiding Allocations: Another key strategy to avoid memory fragmentation is to minimize allocations altogether, especially for short-lived operations. Modern .NET provides the Span<T> type which allows you to work with memory without making copies:
public void FastParse()
{
Span<byte> buffer = stackalloc byte[128]; // Allocated on stack, not heap
// fill and use buffer — no GC allocation happens
}
For small buffers, stackalloc can be used to allocate memory on the stack instead of the heap, completely avoiding GC overhead. This is particularly useful for parsing operations and other short-lived data transformations.
Efficient JSON Processing with Span<T> and Utf8JsonReader:
using System;
using System.Buffers;
using System.Text.Json;
public class JsonParser
{
public void Parse(ReadOnlySpan<byte> jsonUtf8)
{
var reader = new Utf8JsonReader(jsonUtf8);
while (reader.Read())
{
if (reader.TokenType == JsonTokenType.PropertyName)
{
ReadOnlySpan<byte> propertyName = reader.ValueSpan;
// handle property directly from the span - no string allocations!
}
}
}
}
This example shows how to parse JSON directly from UTF-8 bytes without allocating strings for property names or values. The Utf8JsonReader works with spans, allowing zero-copy parsing of JSON data.
Pinned object heap (POH): Another source of fragmentation in .NET was pinned objects. Pinning an object (via fixed or P/Invoke) prevents the GC from moving it, so pinned objects could create "islands" of immovable memory that forced the GC to skip compaction around them. .NET 5 introduced the Pinned Object Heap to segregate long-lived pinned objects from the rest of the heap. Now, when you pin memory (like using GCHandle.Alloc(obj, GCHandleType.Pinned) or when using fixed on an array), the runtime can allocate that object in the POH (if it's a long-lived allocation) so that it doesn't intermix with normal objects. This isolation means the main heaps can compact more effectively without worrying about pinned blocks, thereby reducing fragmentation. The POH trade-off is similar to arenas: it's a separate space so it can reduce fragmentation impact on the main heap, but within the POH itself, objects are not moved (by design), so if you pin a lot and free them, the POH can fragment. The guidance is to pin only when necessary and keep pinned objects around only as long as needed.
Real-world example: Consider a data processing service that builds large byte arrays from streams. If it naively does byte[] buffer = new byte[500_000] for each request and then lets it go, the LOH will get cluttered with freed 500 KB blocks. Over time, those blocks may not be reusable if new requests need slightly different sizes. The process's memory will grow and you may even see Gen2 collections not freeing much memory because it's all fragmentation. Upgrading to .NET 9 helps by reducing some fragmentation automatically. But the best solution is to apply buffer pooling: reuse large byte arrays instead of constantly allocating new ones. .NET's ArrayPool<T> is a great tool here. For example, you can rent a 1 MB buffer once and use slices of it for those 500 KB chunks, or rent exactly the size needed and return it to the pool after use. This way, the memory gets reused and the LOH doesn't fragment as much. In fact, using pooling in .NET 8/9 along with the improved GC can practically eliminate fragmentation issues for many scenarios.
.NET 10 outlook: .NET 10 (the next LTS) is expected to further mitigate fragmentation by possibly making LOH compaction more automatic. It wouldn't be surprising if .NET 10's GC heuristics trigger a compaction of the LOH when fragmentation reaches a certain threshold, all behind the scenes. .NET 10 will also benefit from the continuing evolution of DATAS – meaning after a surge of allocations, the GC might proactively trim the heap size down (performing a compacting GC) when the memory pressure subsides, to avoid stranded free memory lingering indefinitely. From a developer perspective, the runtime is becoming smarter, but you should still design with fragmentation in mind: prefer streaming and chunking large data, reuse buffers, and avoid creating giant objects that live just briefly if you can.
Thread Cache Pollution & Cross-Thread Freeing in .NET
As described earlier, thread cache pollution occurs in native allocators when one thread frees memory that another thread allocated, leading to that memory being held in the wrong thread's cache. In unmanaged scenarios, this can cause surprising memory growth. How does this translate to .NET's world?
Managed memory case: In pure managed code, developers don't manually free objects – the GC handles it. That means we don't have an exact analog of "cross-thread free" for managed objects. If one thread creates an object and another thread drops the last reference to it, the GC will eventually reclaim it on a background thread (or one of the GC threads). The memory will go back to the managed heap's free space. There is no concept of per-thread object caches that hold freed objects for reuse – the GC either compacts them away or leaves holes to fill with future allocations. So, by design, .NET avoids thread cache pollution for managed objects; everything freed goes back to a common pool (the heap) rather than staying owned by a specific thread. This is one of the strengths of a garbage-collected system: deallocation is centralized and optimized in bulk, rather than per free call.
Custom Object Pooling for Reuse: When you need finer control over object reuse, you might implement custom pooling. Here's a simple implementation:
public class SimplePool<T> where T : new()
{
private readonly ConcurrentBag<T> _items = new();
public T Rent() => _items.TryTake(out var item) ? item : new T();
public void Return(T item) => _items.Add(item);
}
This pattern is particularly useful for short-lived but expensive-to-create objects like buffers, network connections, or complex data structures. By recycling objects across threads, you can significantly reduce allocation pressure.
However, this comes at the cost of needing periodic garbage collection pauses and some CPU overhead to do that work. .NET has made those pauses very short in recent versions (with background GC and many optimizations), but it's the trade-off for avoiding per-allocation free costs. The GC approach also means .NET can avoid memory leaks due to cache misses – e.g., in jemalloc if you forget to flush thread caches, memory might sit unused; in .NET, when the GC runs, it knows exactly what's garbage and can reclaim it globally.
Unmanaged & interop scenarios: If your .NET application uses native memory (for example, using unmanaged arrays via Marshal.AllocHGlobal, or calling into a native library that uses its own allocator), then thread cache issues can bite you. Suppose you allocate native memory in C# on one thread and free it on another via Marshal.FreeHGlobal – under the hood, that goes to the OS allocator (on Windows, HeapFree; on Linux, free()). The behavior then depends on the allocator: on Windows, the Heap Manager might put that freed block into a per-heap cache (Windows heaps have LFH which is per-heap, not per-thread, so cross-thread free usually just returns it to the same heap's list – typically fine). On Linux glibc, a free might go back to that thread's arena or be placed in a list that another thread can steal from; it's complex, but significant cross-thread frees can still lead to suboptimal reuse.
If you suspect such issues (e.g., you see memory growing when threads other than the allocating one free memory), you have a few options:
Example – Object pooling and thread handoff: A common .NET pattern is using a shared object pool (e.g., System.Buffers.ObjectPool<T> or a custom pool) to reuse objects. Often, threads will rent an object from the pool and another thread might return it. .NET's pools are implemented carefully to handle thread contention (usually with locks or interlocked operations). There isn't a built-in thread-local caching in ObjectPool (the default ObjectPool<T> in .NET Core 6+ uses a lock-free Treiber stack under the hood). A ConcurrentBag is sometimes used as a pool – it does have thread-local storage of items. A gotcha there is if a thread that has cached items in a ConcurrentBag dies, those items are not immediately visible to other threads until some scavenging occurs. This is somewhat analogous to cache pollution (one thread "owns" items that others can't use). The bag will eventually make them available, but heavy churn of threads can lead to delays. The recommendation is to use the pooling classes from System.Buffers or the TPL Dataflow BufferBlock, as they are designed with such scenarios in mind.
How .NET 9/10 help: The improvements in .NET 9 around high allocation rates indirectly help cross-thread situations as well. For instance, the garbage collector has been tuned to reduce contention on the UOH (Unmovable Object Heap), which includes LOH and POH. In .NET 8/9, when a thread allocates a large object, it must take a lock (per-heap lock for UOH). If many threads are allocating large objects, they contend on that lock. .NET 9's GC tuning reduced such contention by possibly using finer-grained locks or speeding up the allocation path. Less time holding the lock means less chance two threads clash. Also, if one thread is allocating and another triggering a GC, .NET 9 and .NET 10 aim to make those more concurrent (e.g., background GC can reclaim objects while allocations still happen in Gen0).
For cross-thread free of managed objects (which is essentially normal GC), .NET 9's smarter collection prioritization means the GC is better at deciding when to collect, potentially preventing a situation where one thread keeps allocating (increasing heap size) while another thread's objects are awaiting collection. The GC will intervene at an optimal time to collect garbage and satisfy allocations without letting the heap spiral out of control.
Bottom line: .NET's memory manager inherently avoids many cross-thread allocation pitfalls by centralizing garbage collection. For most .NET developers, this means you don't need to worry about who frees what – just allocate responsibly and let the GC do its job. If you use unsafe or unmanaged memory, stick to consistent allocation/free patterns and consider using .NET's SafeHandle or Memory<byte> techniques to manage lifetimes more explicitly (these can ensure frees happen appropriately). And as always, if you suspect an issue, use profiling tools (like Performance profilers or event tracing) to see where memory is being allocated and freed.
Best Practices and Tools for .NET Memory Management
Modern .NET (Core) is very fast, but developers can take additional steps to ensure memory usage is efficient and avoid known pitfalls. Here are some best practices and tools for .NET 9 and 10:
Real-World Scenario: Applying These Strategies
Imagine a .NET microservice processing live sensor data, where each message can be up to 200 KB of JSON. An initial implementation deserializes the JSON using JsonDocument.Parse on a string – which allocates a large string for the JSON text and lots of small objects for tokens. Under load, this results in heavy Gen0 allocations and frequent LOH usage for the large string. The symptom is high CPU in GC and steadily increasing memory (fragmentation) after hours of run-time.
By adopting .NET 9 and following these best practices, we can significantly improve this service:
// Original implementation with memory issues
public SensorData ProcessSensorMessage(Stream messageStream)
{
// Read entire message into a string - allocates large string on heap
using var reader = new StreamReader(messageStream);
string jsonText = reader.ReadToEnd(); // Potentially large allocation
// Parse JSON - creates many small objects
using var document = JsonDocument.Parse(jsonText);
var root = document.RootElement;
// Extract and process data
return ExtractSensorData(root);
}
// Optimized implementation with pooling and Span<T>
public SensorData ProcessSensorMessageOptimized(Stream messageStream)
{
// Get buffer from pool instead of allocating new
byte[] buffer = ArrayPool<byte>.Shared.Rent(1024 * 256); // 256KB buffer
try
{
int bytesRead = 0;
int read;
// Read directly into buffer
while ((read = messageStream.Read(buffer, bytesRead, buffer.Length - bytesRead)) > 0)
{
bytesRead += read;
if (bytesRead == buffer.Length)
{
// Buffer too small, get a larger one
byte[] newBuffer = ArrayPool<byte>.Shared.Rent(buffer.Length * 2);
Buffer.BlockCopy(buffer, 0, newBuffer, 0, bytesRead);
ArrayPool<byte>.Shared.Return(buffer);
buffer = newBuffer;
}
}
// Parse JSON directly from buffer using Span
ReadOnlySpan<byte> bufferSpan = new ReadOnlySpan<byte>(buffer, 0, bytesRead);
return ProcessSensorJson(bufferSpan);
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}
}
private SensorData ProcessSensorJson(ReadOnlySpan<byte> jsonUtf8)
{
// Process JSON without string allocations
var reader = new Utf8JsonReader(jsonUtf8);
var sensorData = new SensorData();
// Use stackalloc for small temporary storage
Span<byte> tempBuffer = stackalloc byte[128];
// Parse JSON directly from binary - zero string allocations
while (reader.Read())
{
// Process each token directly from Span
// ...
}
return sensorData;
}
After these changes, the service handles higher throughput with lower memory footprint, and GC pauses (which could manifest as rare latency spikes) are shorter and less frequent. This is a concrete example of how understanding the runtime's memory allocation behavior and using modern .NET features together yield an outcome greater than the sum of parts: the runtime got faster and our code got smarter.
Conclusion
Memory allocation in .NET has evolved into a sophisticated, high-performance system in .NET 9 and .NET 10. The runtime team has tackled issues analogous to those in native allocators – from central cache locks to multi-heap fragmentation – and delivered improvements that make memory management more seamless for developers. By adopting the latest runtime and following best practices like pooling, span usage, and mindful GC tuning, .NET developers can build applications that handle memory efficiently even under stress. The result is apps that are both fast and stable: minimal garbage collection hiccups, controlled memory growth, and the confidence that comes from knowing the runtime is mitigating fragmentation and contention behind the scenes.