The API gateway component in a cloud native architecture is critical because it offloads critical API security and policy functionality to a common place, allowing the backend APIs and services to focus on business logic. API authentication, authorization, audit, throttling and similar tasks can be complex and difficult to get right, so many organizations choose an API gateway to handle them.
What about for service-to-service (S2s) or internal, east/west traffic? Forcing S2S traffic to “hairpin” back through the API gateway creates extra hops, more latency, increased traffic and decreased efficiency.
But how can you secure the traffic if you skip the API gateway and call services directly? How does the recipient service authenticate and know who is calling it?
Two common ways to secure S2S communication are:
- Using Transport Layer Security (TLS) and client certificates (mutual TLS, or mTLS).
- Using signed JSON Web Tokens (JWTs).
A service mesh solves the problem by using option one to automate a lot of best practices and mitigate the downsides.
Developers may opt to use JWTs for S2S authentication, but this “unravels” what you expect the API gateway to do. That is, all the complexity and brittleness of handling security you offloaded to the API gateway must be recreated and reduplicated in each microservice for S2S communication. This is a major problem because using JWTs for S2S authentication comes with complexity and attention to detail. (For more on this topic and the scenarios below, watch Hoot Episode 59: “JWT vs mTLS for Service-to-Service Authentication.” All the demos are available in our GitHub repo.)
Although there are some great frameworks and libraries for handling JWTs, and while JWT does have its place, using JWT for authenticating S2S traffic is complex, burdensome and puts a lot of trust in developers to get things exactly right. This complexity (or lack of awareness) can reduce the overall security posture if developers cut corners or ignore crucial properties. Developers also need to do this in a language- and framework-specific way (e.g., solving this in Java is different than Go or Node.js). Maintaining, patching and auditing each implementation across all codebases is costly.
How to Use JWTs to Authenticate S2S Communication
There are two approaches to create a JWT to authenticate S2S communication: using an identity provider (IdP) (aka secure token service or STS) or allowing individual services to self-sign the JWTs. (If you’re new or need a refresher on JWT concepts, AuthO offers a good overview of JWTs.)
Option 1: Use an STS to Issue Tokens
The first approach uses a secure token service (STS), a trusted identity or token provider, such as Keycloak, Okta, or Auth0 to issue tokens that represent a specific service.
In this approach, a service exchanges a long-term credential (like a username and password or OAuth2.0 client credentials) for a JWT from the STS that says, “I am Service A.”
Note: A “long-term” credential should be stored securely and used on a limited basis. For example, you could use it once to bootstrap identity on startup and never again. Additionally, these credentials should only be stored in memory.
The STS signs the JWT with its private key, which can be verified with its public key. Service A then attaches the JWT in a request to Service B. From here, Service B can verify the JWT was issued by the STS by verifying the JWT’s signature using the STS’s public key.
For this to work:
- The traffic must be encrypted.
- Service B must check the
aud
claim, expiration, issued at and not before claims for the JWT. - Service B must be prepared to update the STS public key when it is rotated.
Encrypting the traffic is critical because when JWTs are used this way, they represent a “bearer token,” so anyone with the token can impersonate Service A. Encrypting the traffic (such as with one-way server TLS) between Service A and Service B helps mitigate this.
Service B must also check for expiration (exp
), issued at (iat
), not before time (nbf
), and especially the aud
claim to verify the token is valid (not expired, within its time window and intended for use by Service B). JWTs use this convention to guard against replay attacks where an attacker impersonates Service B, takes the token from Service A, then impersonates Service A to call Service C. A valid Service C would check the aud
claim, see the JWT representing Service A is intended for Service B — and reject it.
A different JWT must be used for each service A call because the aud
claim will be different. Trying to use a JWT without an aud
claim or using a wildcard aud
claim raises the risk of a compromised JWT. Avoid doing this.
Lastly, and often ignored, is rotating the STS public keys. In the event of scheduled rotation or planned revocation of the keys the STS uses to sign JWTs, Service B (or any service that relies on the STS for verification) must be able to handle updated signing public keys.
Option 2: Have the Service Sign Its Own Tokens
The second approach uses service-specific keys to sign JWTs. You can use either symmetric keys or asymmetric keys.
In this case, Service A uses its own keys to sign the JWTs it sends to Service B (or any other service). Service B will need Service A’s public key (or symmetric key, which is more dangerous because of key exchange and impersonation concerns) to verify the JWT Service A sends. In fact, Service B will need every public key for any service that calls it using JWTs as authentication principals.
As in the STS-issued JWT example, the traffic between services must be encrypted, Service B must also check the aud
claim, and you need a way to sign public key rotation. You must also use a different JWT for each service you call.
Developers must put in place and track many things to get this right. Certificates must also be issued to at least half the services (the services on the receiving end) to, at minimum, provide one-way TLS. Also the key to the entire process is key management, rotation and safekeeping secrets.
Where Things Can Go Wrong with JWT
We’ve covered some ways JWTs can be used to represent service identity and suggested areas of concern. There are several areas to closely watch in your services architecture to avoid security holes. If these are not bulletproof, you will give attackers opportunities to compromise your system.
One of the most important differences between using client certs/mTLS, like a service mesh does, and JWTs for authentication is this: JWTs send the sensitive bearer token material over the wire, while mTLSs does not. With mTLS, only the public key is sent over the wire, not the private key, and session keys are negotiated. If the JWT is leaked, the JWT is the private secret material and can be replayed. With certificates, only the public certs are shared over the wire.
To guard against replaying the bearer token, you must limit exposure by setting brief expiration times, ideally just a couple of minutes. This puts more onus on the services to refresh their JWTs for their requests. Setting the expiration for hours, days, or months happens too often. This is a big security hole.
Another big security gap is when using an STS to get the JWT material, you send the long-lived credentials (e.g., for a client_credentials flow) over the wire numerous times. These long-lived credentials are extremely sensitive and should be used sparingly (e.g., on startup), not continuously.
Additionally, using wildcard aud
claims or leaving aud
off altogether is another big problem. Failing to use these conventions consistently across services for authentication opens up significant problems. Make sure to create JWTs with the correct aud
claim for each service called.
Last, key rotation is just as important as short expiry and aud claim checks. Invalidating keys is a last-ditch effort to revoke JWTs in the event of a breach and should be handled as quickly and efficiently as possible.
JWT Complexity Is Where Service Mesh Simplifies Things
A service mesh simplifies service-to-service authentication and allows developers to focus on their business logic, not wrangling JWTs and secret material (hopefully) correctly. Just like an API gateway should be used for handling security for north/south and ingress traffic, a service mesh should be used for east/west and S2S traffic. Keep the services and APIs focused on the differentiating business value they can ship, not boilerplate (yet extremely important) security code.
For a deeper dive, watch our YouTube tutorial or read an article our colleagues Lin Sun and Yuval Kohavi recently wrote about using mTLS to solve S2S authentication. You can also access the GitHub repo that walks through these scenarios and how to implement S2S authentication using the Istio service mesh. The documentation in that GitHub repo shows in painful detail how difficult this is to get right.