Skip to content

Cloud Service Mesh: The new Managed Service Mesh by Google

Google Cloud

Introduction

At the recent Google Next ’24, Google Cloud announced the launch of Cloud Service Mesh (CSM), a fully managed mesh service. This announcement generated a keen interest among technology enthusiasts and professionals alike, as this new service represents a significant advance in the field of service meshing.

Indeed, it tends to simplify and improve the scalability of applications in various environments. Before exploring further this innovation, let’s take a look at the main reasons for adopting a Service Mesh. 

Understanding the Need for Service Mesh

First, it’s crucial to understand why service meshes are integral to modern software architectures.

  • Service Mesh is indispensable for building scalable, globally reliable applications that operate across different computing environments and infrastructures. It enables automated service discovery and ensures smooth service-to-service communication, irrespective of the underlying runtime environment.
  • In environments with dynamic microservices and varied technology stacks, maintaining robust security is critical. Service Mesh provides a strong policy framework and implements policy enforcement points at the service layer. This facilitates consistent adherence to policies, even as services move dynamically, laying the groundwork for a zero-trust network architecture.
  • Service management within Service Mesh focuses on two main areas:
    • Telemetry offers detailed insights into service performance at both a global and granular level, covering aspects like logging, metrics, tracing, and the setup of service level objectives (SLOs) and alerts.
    • Advanced traffic management techniques, such as canary releases or blue-green deployments, allow for the safe rollout of new service versions, along with efficient rollback processes.

Challenges with Service Mesh Offerings 

Despite these advantages, the broader adoption of Service Mesh is challenged by its complexity. Managing the lifecycle, integrating, and maintaining the various components within the system are significant obstacles. 

Moreover, concerns regarding the sustainability, scalability, and reliability of the chosen technologies further complicate adoption decisions. Ultimately, the success of Service Mesh solutions depends on their robustness and adaptability.

Google has therefore developed specific products that streamline the implementation of Service Mesh in various environments.

  • Anthos Service Mesh (ASM) is an advanced service built upon the Istio API, designed to operate smoothly across various cloud platforms such as GCP, AWS, Azure, Bare Metal, and VMware. It is tailored for platform administrators and application operators and is available in both managed (currently only on GCP) and non-managed versions.
  • Traffic Director (TD), another Google service mesh product, utilizes proprietary GCP APIs to manage internal traffic across different compute runtimes within Google Cloud. It is mainly aimed at network administrators.

Introducing Cloud Service Mesh (CSM): A Unified Solution of ASM and TD

As Anthos Service Mesh (ASM) and Traffic Director (TD) continued to coexist, their overlapping functionalities—especially in Layer 7 networking—began to confuse users. Furthermore, users encountered limitations with Istio-based APIs when operating outside of Kubernetes environments.

In response, Google Cloud has launched Cloud Service Mesh (CSM), which unified Anthos Service mesh and Traffic Director solution. CSM provides a globally scalable control plane that functions flawlessly across both Google Cloud and external cloud environments.

This fully managed service encompasses comprehensive lifecycle management for both the control plane and data plane, enhancing integration with existing Google Cloud networking capabilities.

Key Features and Benefits of Cloud Service Mesh:

  • Global Control Plane: CSM ensures smooth service mesh operations across both Google Cloud and external environments, offering a cohesive management experience.
  • Fully Managed Service Mesh: Google Cloud handles all aspects of lifecycle management for both the control and data planes, including seamless integration with existing Google Cloud networking features.
  • Streamlined Integration: CSM addresses and resolves integration challenges with other Google Cloud networking functionalities, simplifying the user experience.
  • Enterprise-Grade Focus: Designed with a focus on security, reliability, scalability, and critical enterprise use cases, CSM aims to meet the highest standards of enterprise requirements.

Cloud Service Mesh Architecture

Let’s dive into how Service Mesh fits and functions within your infrastructure.

Firstly, consider your services and workloads. These can be deployed on virtual machines (VMs) or Kubernetes, whether on Google Kubernetes Engine (GKE) or a serverless platform like Cloud Run.

For connectivity, you need to manage traffic between these services and from the internet. To facilitate this, Google offers a range of load balancing options such as Level 4 and Level 7 load balancers.

Now, how does Service Mesh come into play? Once your workloads are deployed, Service Mesh allows you to enhance security and manage traffic failover and load balancing between your services.

Service Mesh consists of two primary components. The control plane, which in the case of Cloud Service Mesh, uses Google’s Traffic Director, acts as a global control plane across Google Cloud.

It supports a wide range of APIs, including OpenCore APIs like Istio and Gateway APIs, as well as Google-specific APIs such as AppNet or service routing APIs. These APIs are versatile, working across VMs, Cloud Run, and Kubernetes, making up the backbone of the mesh network.

Google Cloud wants Service Mesh to be a core part of a company’s infrastructure on GCP, not just an add-on. Customers want it to work seamlessly with other GCP services.

To achieve this, Google Cloud is integrating Service Mesh with various GCP networking features, including data extensibility, rate limiting, load balancing, and identity management. This will create a unified system for managing services within GCP.

The main objective is to extend beyond GCP, covering all environments where clients’ workloads operate, including on-premises and other cloud platforms. The control plane, powered by Istiod, utilizes this open-source framework along with envoys and gRPC proxies to support a wide range of workload types, even in non-GCP environments.

Google Cloud offers via CSM several runtime environments such as GKE, GCE, and Cloud Run, where you can deploy your workloads. Some use-cases require applications to run in multiple runtime environments, rather than just one.

These environments are managed by using two main control planes: 

  • Traffic Director for on-GCP 
  • and Istiod for off-GCP scenarios

CSM also supports a range of open APIs, known as Open Core APIs.

Additionally, there will be an increased integration with the platform, including load balancers, the Data Extensibility Platform, and health-checking features. The setup is similar to other mesh components. 

A globally scalable control plane on GCP that also extends to off-GCP environments. And a data plan, including several available modes.

Furthermore, Google Cloud offers other Cloud Service Mesh (CSM) services that enhance the mesh in various ways. These services can be deployed through multiple interfaces such as Terraform, the gcloud API, or directly through the UI with a simple checkbox. Let’s delve deeper into each of these aspects.

CSM Control & Data Plane

From a control plane perspective, within GCP, it is always hosted and fully managed by Google Cloud, which means Google Cloud handles the lifecycle management for you, and it is globally scalable. This results in a single control plane that extends across all environments, supporting both Google and Istio APIs.

Off-GCP, the control plane is local but still managed by Google Cloud, and it supports only Istio APIs.

Let’s start with the common sidecar mode, shown on the left-hand side of the slide. In this mode, a proxy is installed next to every single workload. This proxy is a comprehensive L7 proxy, deployed whether it’s strictly needed or not, even if you’re only using L4 or employing it for MTLS.

This approach uses a shared fate model, meaning that when you manage the lifecycle of either the application or the proxy, both need to be rebooted together. This method has been quite effective, providing a client-side model that allows for detailed control over traffic.

On the opposite end of the spectrum, shown on the far right-hand side, is the gRPC proxy-less mode. In this setup, no proxy is used at all. If your operations primarily use gRPC, you can manage it directly through the CSM control plane, which can be much less resource-intensive than the sidecar mode. This could potentially meet up to 80% or 90% of your needs with minimal control plane resources.

In the middle, we have what’s called the ambient mode. To give you a brief overview, the ambient mode allows the mesh to operate at either L4 or L7. You have the flexibility to choose the level you need and can transition from L4 to L7 if necessary. This mode does not use sidecars.

CSM will support all three modes, providing a range of options depending on your specific needs and the complexity of your traffic management.

From the perspective of mesh services, Google Cloud offers a managed certificate authority (CA), providing the option for either full management of root certificates by Google Cloud or allowing you to manage it autonomously.. Google Cloud also provides extensive integration capabilities.

Plus, it supports features such as the services dashboard, security insights, service level objectives (SLOs), and alerts. Its CloudOps features are integrated with logging and tracing. In addition, you have the flexibility to create custom dashboards and metrics and Google Cloud integrates with a policy controller to help establish a zero-trust network environment as well. 

Impact of existing customers using Anthos Service Mesh & Traffic Director

Now it’s time to see what Cloud Service Mesh (CSM) means from your perspective, especially if you’re currently using Google Cloud’s Anthos Service Mesh (ASM) or Traffic Director (TD).

Let’s take a look at Google Cloud’s Cloud Service Mesh (CSM) and its implications for existing customers.

  • For ASM customers, With the new updates, you will continue to use Istio-based APIs and benefit from a managed control plane and data plane. If you are using mesh CA or CA services and accessing through the UI, the main adjustment will be the gradual transition of clusters with a managed control plane to a Traffic Director control plane over the next two to three quarters. You will still have access to the same open APIs and other managed services.
  • For ASM users operating outside of Google Cloud Platform—whether on other clouds or on-premises—there are no immediate changes. You will continue to use the Istio APIs and maintain Istiod in your cluster. Looking ahead, Google Cloud plans to transition to a more managed environment. Currently, you manage updates in the off-GCP environment yourself, but Google Cloud will introduce an API to simplify this process and reduce the management burden on your clusters.
  • For Traffic Director customers, there are no changes to your configuration. The AppNet APIs that manage endpoints on Cloud Run, GKE, and VMs will continue to function as usual, ensuring a consistent experience moving forward.

Future Prospects and Strategic Direction

Looking forward, Google Cloud plans to deepen the integration of CSM with the Google Cloud Platform (GCP). Expect to see enhanced connections with GCP services such as load balancers, Cloud Armor, Identity-Aware Proxy (IAP), and health checks.

These enhancements will be directly incorporated into the platform, in line with customer feedback advocating for a fully integrated service mesh, rather than a peripheral addition.

Google Cloud is committed to open standards by supporting open APIs and Istio. They will also offer GCP-specific APIs for VMs, GKE, and Cloud Run. To ensure stability, Google Cloud will prioritize supporting well-established Istio APIs over experimental ones.

The gateway API, especially within a Kubernetes context, is designed to manage ingress and egress effectively and is being further developed through the gamma working group to suit service meshes. This API is flexible, allowing for extensions, including those specific to GCP features.

Moreover, CSM can operate in other environments, including other public clouds and on-premises setups through GKE Enterprise. It will also integrate with Google Distributed Cloud Connected (formerly GDC-Edge) and Google Distributed Cloud Air-gapped (formerly GDC-Hosted).

While Istio and Traffic Director offer robust functionality, Google Cloud recognizes their complexity. To address this, they’re streamlining the service mesh data plane for a more business-friendly experience.

This includes tighter integration with GCP networking, simplified management through CloudOps, and automated setups for service level objectives (SLOs) and alerts. For those who prioritize extensive features, Google Cloud remains committed to fully supporting Istio.

Meanwhile, CSM is evolving into a fully managed, enterprise-ready solution, with enterprise-level functionality and minimal management effort for users. Its development maintains compatibility with Istio and focuses on scalability and large-scale integration.

Enhancements such as multi-cluster entry and new data plane modes demonstrate Google Cloud’s commitment to improving GCP’s network and expandability.

Conclusion

The announcement of Cloud Service Mesh at Google Next ’24 is a clear indicator of Google Cloud’s strategic direction towards offering more integrated, scalable, and manageable networking solutions within GCP.

By reducing the complexity traditionally associated with service meshes and enhancing their functionality, Google Cloud is poised to meet the evolving needs of modern enterprises, making it easier than ever to deploy, manage, and secure applications across multiple platforms.

As businesses continue to navigate the complexities of digital transformation, the advancements in service mesh technology such as those offered by CSM will be crucial in enabling more efficient and secure cloud computing solutions.

The Future of Cloud: Unleash Efficiency with CSM Service Mesh

Written by : Matthieu Audin – App Modernisation Practice Lead – EMEA at Devoteam G Cloud