The Elephant (Payload) in the Room: Handling Super-Sized Requests with Gloo Edge
A customer recently approached us with a problem. They use another vendor’s API gateway that satisfies most of their requirements with one notable exception: it fails on messages with elephantine payloads. They have requirements to issue requests that post up to gargantuan 100MB files. Could Gloo’s gateway technology help with such a problem? Another dimension to this porcine pickle is that they wanted to simultaneously have the gateway layer add some arbitrary custom headers to be injected along with the upstream request.
The purpose of this blog post is to try and wrap our arms around this oversized issue. We’ll work through an example to determine if Gloo Edge can help us work through this problem. Feel free to follow along on your own Kubernetes cluster.
Would you prefer to solve problems like this at an Istio-based ingress point using Gloo Gateway? Not a problem. Check out Part 2 of this blog.
Prerequisites
To complete this guide, you’ll need a Kubernetes cluster and associated tools, plus an instance of Gloo Edge Enterprise. Note that there is a free and open source version of Gloo Edge, and it will work with this example as well. We ran the tests in this blog on Gloo Edge Enterprise v1.10.12. Use this guide if you need to install Gloo Edge Enterprise. And if you don’t already have access to the Enterprise bits of Gloo Edge, you can request a free trial here.
We used GKE with Kubernetes v1.21.11 to test this guide, although any recent version with any Kubernetes provider should suffice.
For this exercise, we’ll also use some common CLI utilities like kubectl, curl, and git. Make sure these prerequisites are all available to you before jumping into the next section. I’m building this on MacOS but other platforms should be perfectly fine as well.
Clone Github Repo
The resources required for this exercise are available in the gloo-edge-use-cases
repo on github. Clone that to your workstation and switch to the large-payload
example directory:
git clone https://github.com/solo-io/gloo-edge-use-cases.git cd gloo-edge-use-cases/large-payload
Install htttpbin Application
HTTPBIN is a great little REST service that can be used to test a variety of http operations and echo the response elements back to the consumer. We’ll use it throughout this exercise. First, we’ll install the httpbin service on our kind cluster. Run:
kubectl apply -f httpbin-svc-dpl.yaml
You should see:
serviceaccount/httpbin created service/httpbin created deployment.apps/httpbin created
You can confirm that the httpbin pod is running by searching for pods with an app
label of httpbin
:
kubectl get pods -l app=httpbin
And you will see:
NAME READY STATUS RESTARTS AGE httpbin-66cdbdb6c5-2cnm7 1/1 Running 0 21m
Generate Payload Files
If you’d like to follow along with this exercise, we’ll test our service using some preposterously large payloads that we generate for ourselves. (You wouldn’t want us to flood your network with these behemoths when you cloned our Github repo, would you?)
- 1MB:
base64 /dev/urandom | head -c 10000000 > 1m-payload.txt
- 10MB:
base64 /dev/urandom | head -c 100000000 > 10m-payload.txt
- 100MB:
base64 /dev/urandom | head -c 1000000000 > 100m-payload.txt
Install a Virtual Service
VirtualServices
are Gloo Edge custom resources that manage routing and policy enforcement on behalf of an upstream service, like httpbin in this case. We will begin with a simple configuration that forwards requests for any domain with any path to the httpbin service. It also uses a transformation to inject the custom header x-my-custom-header
with value my-custom-value
. More sophisticated transformation templates are described in the Gloo Edge docs.
apiVersion: gateway.solo.io/v1 kind: VirtualService metadata: name: large-payload-vs namespace: default spec: virtualHost: domains: - '*' routes: - matchers: - prefix: / routeAction: single: upstream: name: default-httpbin-8000 namespace: gloo-system options: stagedTransformations: early: requestTransforms: - matcher: prefix: / requestTransformation: transformationTemplate: headers: x-my-custom-header: text: 'my-custom-value'
Let’s apply this `VirtualService` now.
kubectl apply -f large-payload-vs-original.yaml
This is the expected response:
virtualservice.gateway.solo.io/large-payload-vs created
Test, Test, Test
Managing with Marlin
Let’s not start with our full-grown, whale-sized payload. Instead, we’ll create a small clownfish-sized payload—we’ll call it Marlin—to get going. Note that Marlin swims upstream with its microscopic 100-byte payload with no problem. In addition, you can see the X-My-Custom-Header
with my-custom-value
that appears in the request headers that httpbin echoes back to the caller. So far, so good.
% curl -i -s -w "@curl-format.txt" -X POST -d "@100b-payload.txt" $(glooctl proxy url)/post HTTP/1.1 200 OK server: envoy date: Wed, 25 May 2022 03:20:42 GMT content-type: application/json content-length: 551 access-control-allow-origin: * access-control-allow-credentials: true x-envoy-upstream-service-time: 9 { "args": {}, "data": "", "files": {}, "form": { "{\"text\":\"tqwypfzxzlkdhbeokohdignyslcelvuuivuprthlejxtzowhnisamykeyillwpiwocbrwmkaknehpvw0123456789\"}": "" }, "headers": { "Accept": "*/*", "Content-Length": "100", "Content-Type": "application/x-www-form-urlencoded", "Host": "35.185.51.108", "User-Agent": "curl/7.79.1", "X-Envoy-Expected-Rq-Timeout-Ms": "15000", "X-My-Custom-Header": "my-custom-value" }, "json": null, "origin": "10.72.2.10", "url": "http://35.185.51.108/post" } time_total: 0.155001s response_code: 200 payload_size: 100
Cruising with Crush?
Marlin was no problem, so let’s move up the food chain by trying a sea turtle-sized payload that we’ll call Crush. Crush carries a 1MB payload, so he may create some cacophony.
curl -s -i -w "@curl-format.txt" -X POST -d "@1m-payload.txt" $(glooctl proxy url)/post
This is not the response we wanted to see from Crush:
HTTP/1.1 100 Continue HTTP/1.1 413 Payload Too Large content-length: 17 content-type: text/plain date: Wed, 25 May 2022 03:29:14 GMT server: envoy connection: close payload too large time_total: 0.580871s response_code: 413 payload_size: 2624556
An HTTP 413 response indicates that we have overflowed Envoy’s default 1MB buffer size for a given request. Learn more about Envoy buffering and flow control here and here. It is possible to increase the Envoy buffer size, but this must be considered very carefully since multiple large requests with excessive buffer sizes could result in memory consumption issues for the proxy.
The good news is that for this use case we don’t require buffering of the request payload at all, since we are not contemplating transformations on the payload, which is what we see most commonly with cases like this. Instead, we’re simply delivering a large file to a service endpoint. The only transformation we require of the Envoy proxy is to add X-My-Custom-Header
to the input, which we have carried along since the original example.
Note that if you’d still prefer the approach of increasing Envoy’s buffer size to handle large payloads, there is an API in Gloo Edge for that, too. Check out the perConnectionBufferLimitBytes
setting in the ListenerOptions
API. This can be managed on a per-gateway level, as documented here. But generally speaking, eliminating buffering altogether offers superior performance and less risk.
Re-calibrating for Crush
VirtualService
that sets the optional Gloo Edge passthrough flag. It is commonly used in use cases like this to instruct the proxy NOT to buffer the payload at all, but simply to pass it through unchanged to the upstream service.VirtualServices
where you are NOT performing transformation based on the payload content, like using extractors to pull selected elements from the message body into request headers. Buffering is absolutely required for those transformation types, and enabling passthrough mode would likely cause mysterious and unexpected behavior.VirtualService
‘s transformation spec to enable massive message payloads:requestTransforms: - matcher: prefix: / requestTransformation: transformationTemplate: passthrough: {} # <<====== NOTE the addition of the passthrough directive headers: x-my-custom-header: text: 'my-custom-value'
Now apply the “passthrough” version of the VirtualService
:
kubectl apply -f large-payload-vs-passthrough.yaml
Expect this response:
virtualservice.gateway.solo.io/large-payload-vs configured
Note that for this and all subsequent examples, we’ll suppress the native httpbin output because it wants to echo back the entire original request payload. And life is too short to watch all of that scroll by. Instead, we’ll rely on curl facilities to show just the response bits we care about: the total processing time, HTTP response code, and confirming the size of the request payload.
Now let’s retry Crush and watch him cruise all the way to Sydney with no constrictions:
% curl -s -w "@curl-format.txt" -X POST -T "1m-payload.txt" $(glooctl proxy url)/post -o /dev/null time_total: 2.260582s response_code: 200 payload_size: 10000000
Bashing with Bruce
Of course, the most fearsome payloads of all swim with Bruce, the great white shark. We’ll set our bulkiest payloads against the proxy with Bruce-sized proportions, 10MB first and then our ultimate goal of 100MB.
curl -s -w "@curl-format.txt" -X POST -T "10m-payload.txt" $(glooctl proxy url)/post -o /dev/null time_total: 20.219588s response_code: 200 payload_size: 100000000
Finally, we achieve our goal of handling a 100MB payload:
curl -s -w "@curl-format.txt" -X POST -T "100m-payload.txt" $(glooctl proxy url)/post -o /dev/null time_total: 48.744532s response_code: 200 payload_size: 1000000000
Bruce ran the gauntlet with no problems, thanks to our passthrough
directive causing the proxy to bypass buffering of the payload. Note the better-than-linear scaling with respect to payload size. Even when we brought Bruce to the party and increased the payload size by two orders of magnitude, there were no issues. Over that 100x increase in payload size, the response times increased by just a bit over 20x (2.26s to 48.7s). The option to disable payload buffering with the passthrough option really paid dividends here.
Cleanup
If you’d like to clean up the work you’ve done, simply delete the Kubernetes resources we’ve created over the course of this exercise.
kubectl delete -f httpbin-svc-dpl.yaml,large-payload-vs-passthrough.yaml
You should see a response like this that confirms the resources have been deleted.
serviceaccount "httpbin" deleted service "httpbin" deleted deployment.apps "httpbin" deleted virtualservice.gateway.solo.io "large-payload-vs" deleted
Learn More
- Check out Part 2 of this blog, where we solve the same problem using an Istio-based Gloo Gateway ingress instead.
- Explore the documentation for Gloo Edge.
- Request a live demo or trial for Gloo Edge Enterprise.
- See video content on the solo.io YouTube channel.
- Questions? Join the Solo.io Slack community and check out the #edge-quickstart and #gloo-edge channels.