Best Practices for Secure Istio Deployment with Gloo Mesh Core

The recent vulnerabilities discovered by Wiz in a popular cloud service highlight critical misconceptions about container security and the role of service meshes like Istio. In this case, a workflow service allowed arbitrary user code to be run, and Istio was implicated as part of the security bypass with which the researchers were able to gain network access. Upon closer examination, the fundamental problem wasn’t with Istio itself, but a combination of an assumption that Istio sidecars act as  as an egress firewall and the fact that the environment wasn’t secured in such a way that Istio couldn’t be impersonated. The researchers were able to:

  1. Run arbitrary Kubernetes pod and modify the pod spec (this is the functionality provided by the SAP AI service)
  2. Configure pods with specific UID, where one of them, 1337, wasn’t blocked by the Kubernetes admission controller
  3. Gain unrestricted network access.

It’s crucial to understand that impersonating a UID and changing process namespace is not an Istio-specific issue.

Istio, like other service meshes, is not designed to be a complete security solution, particularly not as a client-side egress firewall. Istio’s documentation explicitly states that it doesn’t claim to secure pod egress and that network policies should be used instead.

Istio offers security for pod ingress using mTLS and authorization policies, and general egress management with a combination of network policies and an egress gateway.

The importance of defense-in-depth

This example illustrates the critical importance of a defense-in-depth approach in cloud-native environments. This strategy, which involves implementing multiple layers of security controls, is essential for protecting against sophisticated attacks and mitigating the impact of any single point of failure.

Here are a couple of areas where additional layers of security could have prevented or mitigated the attack:

  1. Network Layer: While Istio provides traffic management and mTLS between services, additional network-level isolation between tenants was lacking. Kubernetes Network Policies or similar were needed for this layer of protection.
  2. Pod Security: The ability to run pods with arbitrary UIDs represents a significant security gap. A proper isolation level for user-generated pods could have prevented this.
  3. Authentication and Authorization: Once network restrictions were bypassed, internal services appeared to lack robust authentication mechanisms, highlighting the danger of an overly trusted internal network.
  4. Continuous Monitoring and Alerting: The absence of real-time monitoring for security anomalies allowed the attack to progress undetected.

This multi-layered approach is about more than having multiple security tools. It’s about creating a cohesive security strategy where each layer complements and reinforces the others. When one layer is compromised, the others should still provide protection.

Even with the best security tools in place, the user is responsible for fully understanding the tool’s capabilities and using it correctly. It’s like purchasing the world’s strongest door; it offers no protection if it’s installed backward. A 2023 report by Thales claims that 55% of Kubernetes-related security risks stem from misconfigurations. This statistic underscores the critical importance of proper setup and configuration in maintaining a secure environment.

Gloo Mesh Core insights

An essential component of defense-in-depth is the ability to detect and respond swiftly to potential security issues, including those arising from misconfigurations. This is where Gloo Mesh Core’s security insights and observability features become invaluable. Gloo Mesh Core seamlessly integrates with your existing Istio environments, providing comprehensive insights into service mesh performance, best practices for configuration, and detailed observability and security analysis.

The invalid application UID insight is designed to detect scenarios where pods run with unexpected or potentially dangerous UIDs. This is exactly the type of issue that was exploited in the above case. The insights engine inspects your configuration and live telemetry data from its OTel pipeline to validate that your environment aligns with the best practices Solo.io has built over years of helping customers deliver production-grade platforms at scale.

Gloo Mesh Core leverages OpenTelemetry to collect telemetry data from various sources across all your clusters. With OpenTelemetry, you can establish pipelines for these diverse sources, consolidating all your telemetry data in a single location. The pipelines make it easy to integrate the Gloo Mesh Core insights engine with any OpenTelemetry compatible backend, including OSS tools like Prometheus, or observability vendors. The Gloo UI presents these observability details in a unified, single pane of glass. Utilizing this comprehensive service graph is another key tool for identifying and addressing any security issues.

Conclusion

By leveraging Gloo Mesh Core’s insights and following these best practices, organizations can build more secure, resilient, and observable service mesh deployments. This proactive approach not only helps prevent vulnerabilities like the one discovered by Wiz, but also provides the visibility and control needed to manage complex, cloud-native environments securely.

Security in modern cloud-native architectures is an ongoing process. Regular audits, continuous monitoring, and staying updated with the latest security practices are key to maintaining a robust security posture in your Istio-enabled environments.