Skip to content

KubeCon Day 1: Multi-cluster failover with Linkerd & Google Reviews

KubeCon is the yearly not-to-miss conference around all things Kubernetes. This year the flagship event took place in Valencia, Spain. We were lucky enough to attend KubeCon with colleagues from Devoteam G Cloud from all over EMEA! This article is part 1 of my recap of KubeCon 2022.

This article is written by Jason Quek, Global CTO at Devoteam G Cloud

After the keynote presentation at KubeCon covering topics such as KubeEdge, Mercedes’ journey to Cloud Native tech, I chose to join a session titled – Multi-cluster failover with Linkerd.

Given my background as a Google Cloud Certified Hybrid and Multi Cloud Fellow, I have actually worked on Linkerd before adopting the Istio and Anthos Service Mesh standards set out by Google Cloud. With Anthos Service Mesh, it is clear on how to build a multi-cluster mesh for failover using tech such as Ingress for Anthos on GKE, and Traffic Director, both with in-built capabilities to prevent single cluster failure. 

This is an important topic for Devoteam customers who are evaluating and choosing their computing platform, orchestration framework and now, their service mesh.

Linkerd has a well-earned reputation for its simple configuration and easy way to get started, and is currently an incubating project in Cloud Native Computing Foundation. It covers mostly the same areas as Istio such as observability, security and reliability, able to observe latencies between services, secure traffic between microservices with the good stuff such as circuit breaking. It uses the same pattern with annotations and sidecars as Istio with a control plane, and uses Rust for its proxy implementation vs the Envoy proxy used by Istio. One difference for Linkerd is its purpose built proxy implementation, which means the proxy cannot be run on its own, compared to Envoy.

In my opinion, I like purpose built components for frameworks, which often means stripping away unnecessary code and functionality not needed for the framework. However the use of Rust, while extremely efficient, does limit the group of developers who can understand and debug it. 

For multi-cluster Linkerd setup, it follows the same paradigm as east-west clusters, like east-west gateways, multi-primary, multi network setup in Istio for a multi-cluster mesh.

Linkerd has a specific extension named Failover, which allows an administrator to define something like this.

No alt text provided for this image

This allows you to specify backup services with weights, so if there are no pods in sample-svc which have a ready state to serve, and if any of the backup services have at least one ready pod, the traffic will be routed to that service. 

Comparing this to how you would approach this problem in Istio, is via DestinationRules

No alt text provided for this image

One common complaint with Istio is in its complexity of their yaml definitions, understanding the relationships between VirtualService, DestinationRule, Gateway. 

And this is an example of how it could be simplified so that failover is not by a configuration as a by-product but as an explicit definition like in Linkerd. However you can see that in Istio, you are able to configure failure conditions more to your needs using different load balancing methods as well as the number of errors before choosing to failover. I believe that for Istio to be more user friendly and easy to use, some lessons have to be learnt for concise extensions which Linkerd supports.

Taking the chance to meet and talk with Google and customers

Kubecon is not only a chance to learn about Cloud Native Computing Foundation tech and hear battle stories, but also for a chance for Devoteam G Clouders to meet and talk with Google and our customers.

One such session was a UI / UX feedback ran by product managers and UI / UX specialists on Anthos Fleet management on GKE. I have been giving feedback to the GKE and Anthos topics since 2019 on GKE on-prem, and have noticed that suggested improvements have always been considered and sometimes implemented. That is really what makes it so great to work on Google Cloud, to be able to affect the product and give feedback to the people who actually build the product on behalf of our customers and make it easier for us to do our jobs.

The next session I attended was KubeEdge: From Fixed Location to Movable Edge. Now Kubernetes at the edge is a pretty hot topic at the moment, but with many flavors as usual. I have been experimenting mostly with 2 types of edge deployments, K3S and Anthos on Bare Metal.

With K3s, it is a lightweight, highly available and conformant Kubernetes flavor. It is currently a CNCF sandbox project, but the key word is conformant. This means that K3s clusters can be installed and actually run as attached Anthos clusters.

No alt text provided for this image

With Anthos on Bare Metal, this is even simpler with an installer provided and supported by Google called bmctl. From there, you would first install an OS on your edge node, either CentOS, RHEL or Ubuntu (I would suggest RHEL or Ubuntu due to the CentOS Linux 8 EOL on 31st Dec 2021). You can run Anthos on BM in standalone, multicluster or hybrid mode, which means the lowest amount of nodes you would need is one. This makes the edge rollout strategy simpler, to stamp out installations of the edge devices via Ansible playbooks and then shipping it off to edge locations, such as retail stores or stadiums, to do low latency on site calculations, and deployment of applications via Anthos Connect Gateway after the devices have been plugged in.

KubeEdge is a new flavor of this and is currently a CNCF incubating project. You will still need Kubernetes run in the cloud somewhere, and install a component named CloudCore. Then for the edge nodes, you would install EdgeCore via a binary, which includes lightweight components for IOT communications, specifically EventBus which supports MQTT natively. It also has a component named DeviceTwin which can store device status and device twin information between the cloud and the edge. This means that there is a low weight, opinionated way of deploying communication between devices already inbuilt into the product, and you do not have to search for that solution when starting to deploy your applications at the edge. However it still requires some inbound ports to be opened for communication between the CloudCore and EdgeCore, as can been seen from the join command from the edge node to the CloudCore node.

No alt text provided for this image

I was not that certain after the session how this would be behave in disconnected modes, something that Anthos on Bare Metal has considered. I’m sure this problem has been resolved somehow as the KubeEdge solution was showcased in the Kubecon keynote, running on satellites with machine learning inference workloads from device inputs.

After a great day of discussions, learning and feedback, Devoteam G Cloud held an after party with over 70 people from our customers all over Europe, as well as product managers, partner engineers, customer engineers and developers from Google, where we just hung out and talked about their impressions of the conference and key takeways to implement after the conference. It was really great seeing the Devoteam team from France, Belgium, Sweden, Spain and the Netherlands coming together and fostering the innovative and learning culture that I’m proud of being a part of.

Do you want to talk to me or one of our experts?