The Ultimate Kubernetes FinOps checklist

A lot is said about FinOps recently. The economic climate might have something to do with that. At the same time, Kubernetes adoption is at an all-time high. Rather than another article on what both are, and why they are essential in today’s world (if you need a refresher, look here for FinOps, and here for Kubernetes). This article will focus on applying FinOps practices in an environment built to scale, like Kubernetes.

Spinning up a Kubernetes cluster is relatively easy, using managed services like Google Kubernetes Engine. Deploying containers into said cluster becomes more and more straightforward with quality tooling and proper DevOps methodologies. However, doing so in a manner that matches the FinOps philosophy, could become more challenging.

As a recap, the FinOps Foundation defines the following three phases. Each phase flows into the next, ad infinitum.

Inform: visibility and allocation
Optimise: utilisation rate and opportunities
Operate: continuous operations and improvements

Let’s have a look at how these apply in a Kubernetes environment. More specifically, with a Cloud managed service, that has scaling capabilities beyond imagination.

Inform

You want to get insights into your Kubernetes usage and the resulting spending. Since Kubernetes, with its namespace-based approach, is built for multi-tenancy, a typical cluster will have multiple teams or departments deploy their applications into it. Therefore, you can not simply attribute an entire cluster cost to a single application, team, or department. Or you’re a SaaS company, to an individual customer you are hosting your solution for.

Pro tip: Running on GKE? GKE Cost Allocation is a great way to start.

Going a layer deeper, at an application level, start looking at how efficiently it’s using its resources. When writing your Deployment YAMLs, there are many aspects to consider that might impact resource consumption. Therefore many things can affect the cost.

It is crucial to ensure your consumption is traced back to a specific workload, team or project at any time. Kubernetes, with its label-based approach, lends itself perfectly to this. Additionally, if you’re using GKE, your node pools can be tagged, allowing extra granularity.

Optimise

With this arsenal of information now readily available to all involved teams, it is time to start optimising your fleet. Look at all the data available, and put on your Kubernetes expert hat (if you don’t have one, feel free to get in touch!). Let’s look at some standard optimisations out there:

No CPU, GPU or Memory quota on your namespace? A misconfiguration might make resource utilisation explode! Not defining Pod CPU Requests but only their Limits? Kubernetes will copy the limit you specified and use it as the requested value. Now each microservice is claiming the maximum resources in your cluster, regardless of its actual load!

Do you host a memory-intensive stack with relatively low CPU consumption on node pools holding a default CPU-to-memory ratio? A large portion of the CPUs will be idle a lot of the time. Even worse, when your cluster needs to scale (add more nodes) because you’re reaching its memory limits. Even more, you add unused CPUs!

How to configure your app and infra scaling? There are many scaling parameters to configure in a Kubernetes environment, both on the application level (HPA, VPA) and on the infrastructure level (cluster-autoscaling). If you do not utilise these deployment autoscaling parameters, you will need to make educated guesses regarding the allocation of memory and CPU resources. Later on, if you realise that your estimation was incorrect, you will have to revisit and fine-tune them. Why not let that happen automatically, within pre-set boundaries? Misconfiguring your fixed amount of pods per deployment, or nodes in the node pool will leave you with resources just sitting idle, waiting for workloads that will never come.

Compute resources? Wasted. Needless costs? Incurred.
Pro tip: Any major cloud provider should be able to give you those insights as GKE does with cost-related optimisation metrics.

Operate

Once you have complete visibility of your Kubernetes costs, you know precisely what each application costs. You optimised your environment to have it scale to perfection. Now you can continuously evaluate the business value of all individual applications deployed in your fleet of clusters against their actual costs, rather than the costs incurred because of bad cluster- and application management.

The Ultimate Kubernetes FinOps checklist

Not caring about the costs of your Kubernetes cluster might get you speed for your POC. Once things hit production, and the adoption of Kubernetes grows, it’s crucial to follow some FinOps best practices to keep costs under control.

Inform: Leverage all Kubernetes-native options to allocate costs to teams (namespaces, labels…)
Inform: Make sure to check all data made available to you by your cloud provider, and ensure relevant team members have access and are trained in interpreting the data.
Optimise: Ensure your cluster, node pools and deployments sizing accordingly.
Optimise: Use different node pools with additional characteristics, dedicated to the workloads they will host. Taints and Tolerations will help here!
Operate: Cost optimisation is not a one-off! Keep evaluating your costs and comparing them to the business value the applications bring.

Take control of your Kubernetes costs today! 🚀

Implement the FinOps best practices together with Devoteam G Cloud for a successful and cost-effective production environment.