Accelerating Legacy WCF SOAP Services Using YARP
Introduction
In the past few years, I've tackled unique challenges in modernizing legacy systems. This article shares my experience, offering solutions for those struggling with WCF or SOAP services. Even if you're unfamiliar with WCF, you can benefit from the caching techniques discussed. We'll explore how to accelerate legacy WCF SOAP services while keeping client compatibility.
Understanding WCF
Windows Communication Foundation (WCF) is a framework for building service-oriented applications and was the go-to solution for developing services for a long time. It provided a programming model that allowed developers to focus on service functionality while handling service communication through a sometimes-complex binding configuration. This often made it easy to change or support new communication protocols without significantly rewriting the service itself. Despite this separation, some communication-related aspects still needed to be added to service definitions and the classes used for information exchange through annotations (attributes). WCF offered robust security features like message signing and encryption, which might be relevant depending on the context. WCF was a mature framework with a large existing codebase.
Most commonly, bindings were related to SOAP, but WCF also supported REST and messaging protocols like MSMQ. It was possible to mix and match communication protocols and message formatting solutions to create custom bindings, which at the peak of WCF's popularity, seemed very flexible and powerful. Some bindings allowed the exposure of service contracts like WSDL, enabling client code generation that made consuming WCF services straightforward.
Challenges with WCF and SOAP
WCF faced issues with mismatched protocol extensions from different software manufacturers, such as SOAP. Despite SOAP being a standard, integrating a .NET WCF client with a Java tool-generated client or consuming a Java service with a .NET client was not trivial. Microsoft and Oracle attempted to address this with Web Services Interoperability Technology (WSIT), but it was not an ideal solution, and WCF struggled to facilitate integration of heterogeneous services despite using standard communication protocols.
XML, XML Schema (XSD), and SOAP have declined in popularity mainly due to the rise of JSON, a simpler and lighter-weight data format compared to XML. JSON is easier for humans to read and write, and simpler for machines to parse.
There was a time when creating and maintaining service schemas was challenging, making XSD cumbersome for developers. Consequently, developers preferred using Plain Old XML (POX). JSON, which didn't require a schema and was more compact than SOAP or POX, seemed like the perfect solution. Ironically, today, services that use JSON almost always rely on JSON Schema and OpenAPI. This irony highlights that the concept of service schemas was beneficial from the start.
To provide context and show that SOAP was not a bad idea at the time, consider that before SOAP and WSDL, there was CORBA (Common Object Request Broker Architecture). CORBA aimed to enable communication between applications written in different programming languages and running on various platforms. SOAP addressed several issues by leveraging HTTP as a transport protocol and using XML. This focus on HTTP simplified integration across different systems and platforms, making web services more accessible and easier to implement.
The lesson here is that technology evolves by addressing the shortcomings of previous solutions. CORBA was ground breaking when it emerged, solving many issues of its time. Later, SOAP improved upon these solutions and addressed additional problems. Subsequently, JSON and REST resolved further issues and became the preferred choices. This cycle will continue; for instance, gRPC addresses many of the limitations of JSON and REST by utilizing HTTP/2, Protobuf, and proto definitions. It supports server push, client push, bidirectional streaming, and enables truly heterogeneous service interactions.
The Problem with Legacy WCF Services
As XML, XML Schema, and SOAP declined, another major issue with WCF emerged: frameworks specifically designed for REST, like Web API, or the direct use of messaging client libraries, were often easier and cleaner to use. These alternatives offered better performance and were simpler to learn compared to WCF's abstractions. Consequently, Microsoft eventually discontinued support for WCF. However, many companies still rely on WCF for their services, leaving them without a clear path to modernize their legacy systems, especially since Microsoft's recommendation to use gRPC essentially requires a complete rewrite.
A full rewrite is not only time-consuming, risky, and potentially expensive, but it also requires considering all clients that use the legacy WCF service. Some of these clients may be in NuGet packages, while others may be generated from the service WSDL, meaning these clients must be significantly altered to support the new communication protocol and message format. When you add up the effort to change both the WCF service and all its clients, the total effort can be immense.
The main reason to rewrite a legacy WCF service is that with WCF no longer supported by Microsoft, you cannot upgrade it to .NET Core. Additionally, SOAP-based WCF can be slow due to the serialization required and the large XML envelopes exchanged between the client and server. Switching from SOAP binding to something like MTOM or using a custom binding, such as binary over HTTP, still requires changing all clients, which is not ideal.
领英推荐
Solution Overview: CoreWCF and YARP
Fortunately, you can use CoreWCF, a port of the service side of WCF to .NET Core. The project's goal is to enable existing WCF services to migrate to .NET Core with minimal code changes, such as adjusting namespaces, without requiring a complete rewrite. This solution works if you are not using third-party bindings that lack CoreWCF support. If the third-party binding is open source, it can be modified; otherwise, you may face limitations. CoreWCF addresses the .NET portability issue, allowing clients to continue using modern .NET Frameworks without needing changes. However, making any .NET project, whether it uses WCF or not, can be complicated due to non-portable NuGet references or unsupported features.
The second major challenge is performance. If you are not yet using HTTP compression, you can significantly reduce SOAP message envelope sizes by gzipping them on your web server. This works with clients because they can use the Content-Encoding response header to handle compressed responses. Compressing requests may be trickier because you might need to change the clients and prepare your service to receive compressed messages.
If your service is cacheable, you'll face challenges with caching SOAP messages. Unlike REST services, SOAP services can't use caching as effectively because they rely on POST and exchange potentially large XML SOAP envelopes. This makes it difficult to improve speed without altering the clients, and fronting your service with a CDN is not feasible.
I experimented with a few solutions, such as using Nginx to calculate a hash from a POST body and headers to cache responses. Varnish also worked, but both options became complex when using round-robin routing between multiple nodes, as you can't use routing based on a consistent hash. Ideally, using a distributed cache is the best approach, but integrating cache providers in these cases is complex.
I discovered that Yet Another Reverse Proxy (YARP), a library by Microsoft for creating high-performance, production-ready, and highly customizable reverse proxy servers, offers a lot of flexibility. YARP allows for caching SOAP messages by calculating hashes from SOAP messages and headers, and adding support for distributed cache is straightforward for .NET developers.
Implementing the Solution
To sum up, .NET portability can be addressed with CoreWCF, HTTP compression can help reduce network usage, and YARP can front your SOAP service and cache responses. In my solution, I utilized several techniques: I implemented a two-layer cache with Redis for level 1 and S3 for level 2 to mitigate cold start issues. Since the responses from the SOAP service were gzipped, I stored them in a compressed format, reducing the memory and storage requirements in the caches.
Possible Future Improvements
Even though YARP can help cache SOAP responses and speed up requests, the solution still cannot be fronted with a CDN. By implementing a trick used by the Apollo GraphQL server called Automatic Persisted Queries (APQ), it is possible to turn HTTP POSTs into GETs. This trick involves sending back the cache key in a response header to the client, allowing it to invoke the proxy endpoint with GET and pass the key in the URL. The server can then cache requests as well as responses, storing requests with the calculated cache key. When a GET request comes in, the proxy can replay the call if the response for the key is not cached. This could allow CDN fronting for your SOAP service, though clients would need to be changed to leverage this feature. The good thing about APQ is that it is backward-compatible, meaning non-APQ enabled clients would still work as before.
YARP can also be used to calculate cache headers like ETag and Last-Modified, which can be sent back to the client as response headers. Implementing conditional requests (where the client sends If-Match with a formerly known ETag or If-Modified-Since with a previously known Last-Modified) would allow the server to respond with a 304 Not Modified status, reducing the need to send a full response body. This would also require changes to clients but could drastically reduce network traffic.
Conclusion
This solution worked quite well, and developing a custom YARP reverse proxy solution took less time than rewriting everything and changing all clients. From the clients' perspective, everything functions the same as before, except they now connect to the YARP proxy instead of the SOAP service directly or through a load balancer.