Pentesting Twirp/gRPC-Web : Recon and Reverse-engineering
gRPC-Web has reached General Availability! The official blog announcement can be found here: https://grpc.io/blog/grpc-web-ga gRPC makes leveraging Protocol Buffers extremely easy, and ProtoBufs in and off itself has some very good advantages over REST when it comes to performance and contract-based development. However so far ProtoBufs are used typically in the backend to facilitate inter-microservice communication. Before gRPC-Web, a REST API server was required specifically to translate normal HTTP REST commands into ProtoBuf, which introduced more moving parts. With the release of gRPC-Web, developers can leverage ProtoBufs end-to-end, including in the frontend. This is summarized in the architecture diagram from the gRPC-Web blog post below:
Getting started is very straightforward, and we had no issues setting up a basic gRPC-Web based architecture with plain Javascript in the frontend, a NodeJS backend server, and an Envoy proxy in between. The full setup is outlined in the gRPC-Web Hello World example: https://github.com/grpc/grpc-web/tree/master/net/grpc/gateway/examples/helloworld With that setup, we can begin our analysis
Identifying gRPC-Web traffic
This is relatively straightforward. Looking at the HTTP headers of intercepted traffic, we see tell-tale “grpc” in several locations such as
- Accept request header
- Content-Type request header
- Access-control-expose-headers in the response header
Note the “grpc-web-text” application type. For the remainder of this article we assume that this is being used, however it is worth nothing that this is not the only possible value [Ref1].
Decoding gRPC-Web traffic
If you observe the gRPC-Web traffic passing over the wire, you’ll notice the base64 encoded data, similar to the below:
This presents a new challenge and opportunity when performing penetration tests on gRPC-Web powered services. The challenge is that gRPC-Web is not as intuitive as REST APIs when viewed over the wire or intercepting in a proxy. If we decode the Base64 traffic above, we see some readable data along with some binary data, so we pass the result through a hex editor (xxd in this case):
We could fuzz these payloads directly, but it would definitely help if we knew the structure behind these payloads, to know what to fuzz and figure out inter-dependencies and so on. This is where the opportunity lies: Since gRPC-Web is ProtoBuf based, and the latter uses .proto files to generate client stubs, these client stubs are included in the generated frontend JavaScript code, and can be used to get an idea of the gRPC message structure
Reverse-Engineering the minified JS code
In the getting started guide, WebPack was used to generate and minify the client-side JavaScript code. This makes the reverse-engineering of the message structure not very straightforward, however it is certainly possible to do so. We wrote a quick python script (see the Github Link here) that does exactly this. It should be straightforward to use this script as a starting point for your own investigations. First, let’s look at the actual .PROTO file which generates the traffic (also present in the Github link above):
Using the above .PROTO file, the protobuf compiler generates JavaScript modules which developers can use to quickly get and set values within the message. In the above example, two RPC services are defined, “SayHello” and “SearchResults“. In the case of SearchResults, an Empty request is expected, and a message of type “SearchResponse” is returned in the request. SearchResponse is in turn a list (or array) of messages of type Result, Running the resulting webpack minified frontend JS code through our python script, we manage to reverse-engineer this structure, as shown in the results below:
The python script basically parses the minified JS file looking for keywords that indicate the presence of message definitions, so JS obfuscation would thwart these efforts, however the exersize does show that it is possible to extract the message structure with some work. Note how during reverse engineering we also managed to extract the gRPC endpoints being used by the JS code, which aids in recon efforts.
Going further
The script presented above is just a starting point to guide your efforts in reverse engineering gRPC-Web JS files, however it is not a silver bullet since JS obfuscation or custom webpack/browserify configurations would likely change the resulting minified JS code and parsing would subsequently not work properly. A few points to help in such situations:
- Chrome and Firefox dev tools both allow you to “prettify” minified JS code that is present on a site. During a pentest, observer HTTP calls and note the URL endpoints that are being used. Then, search the prettified JS code for these URL endpoint addresses, and set debugger break-points on or near any found statements
- A lot of the usual techniques that we apply to REST based APIs still apply (eg injection, either XSS or SQLi or CSRF). Search the web-page for any reflections that are controlled by user-input. In the toy gRPC-Web example we built, the HTML code was controlled by the URL location hash, which could lead to an injection vulnerability. gRPC-Web in and of itself does not mitigate these types of design vulnerabilities, so these should still be high on your pentest check-list, just like in typical REST API testing