How to secure your Google Kubernetes Engine cluster with Terraform & Istio

by Devoteam G Cloud, on Feb 10, 2021 2:03:14 PM

In this article, we are going to show you how you can harden your Google Kubernetes Engine (GKE) clusters with more security using Terraform & Istio. You're going to learn which benefits each new security measure brings, and how you can configure them using Terraform to easily integrate them in your infrastructure. At the end of this post you can find some of our best practices to follow when developing with Kubernetes and on GKE. This article is a sequel to our previous posts in this series about how to set up secure multi-cluster CI/CD pipelines with Spinnaker on Google Cloud, using Terraform. 

Kubernetes Security terraform Istio
This article is Written by Kevin Serrano, Google Cloud engineer at Devoteam G Cloud & CTO at Nessie

In the previous articles of this series, we used Terraform to setup multi-tenants Google Kubernetes Engine (GKE) clusters in multiple environments (Dev, Stagging, Prod). We also isolated cloud resources in dedicated Google Cloud Platform (GCP) projects and Kubernetes resources within namespaces. In addition, we securely configured credentials for each namespace and provisioned Spinnaker to manage the workloads on each cluster independently.

This is a short overview of this article series:

Now it's time to bring more security to our GKE clusters!

First, we are going to explore the different types of clusters GKE offers. We will explain what are the advantages of using a GKE Private cluster and we will learn how to secure its access.

Second, we will introduce Istio and explain how to use some of its features to control the traffic within the clusters. Third, we are going to explore three security features we can use on GKE to control our workload.

Finally, we will review some security best practices in Kubernetes, which can also be applied more generally to multiple architectures.

Table of Contents

  • GKE Private cluster
    • Network and Firewall rules
    • Terraform snippets
  • Istio Service Mesh
    • Customisable Istio installation with Terraform on GKE
  • Securing Workload
    • Workload Identity
    • Network Policies
    • Pod Security Policies
  • Best Practices

Public vs. private GKE Clusters

Public GKE Cluster

GKE offers two different cluster accesses: Public and Private Cluster. A Public cluster doesn't imply the cluster can be publicly managed. It means the Master and nodes IP are public, and thus can be accessed from anywhere on the internet. Authentication and Authorization are still in place to restrict calls to the Kubernetes API server on the Master.

When a cluster is created on GKE, this is the default and the least secure option available. Exposing both master and nodes with public IPs increases the risk of compromising the cluster and the workloads. For example, it will be easier for an attacker to abuse an exploit that allows him to SSH into the nodes if they are exposed on the internet.

Private GKE Cluster

On the other hand, a Private cluster doesn't assign public IP addresses to the nodes. It reduces the attack surface and the risk of compromising the workloads. The Master node, which is hosted on a Google-Managed project, communicates with the nodes via VPC peering.

This peering is automatically set up by Google and should not be modified or removed. The Master node IP can either be private only, or have both a public and private IP. Both have their pros and cons.

  • Set the Master node IP to private. In this case, the Master node doesn't have a public IP. This means the Kubernetes API cannot be accessed from the internet at all. This is the most secure option. To access it, we need to configure a connection to the network. This can be achieved by deploying a proxy inside the VPC that we will use to communicate with the Kubernetes API over Cloud VPN or Cloud Interconnect from an on-premise network. This option can be expensive and complex to set up.
  • Set the Master node IP to public, with authorised network enabled. In this case, the Master node has a public and private IP, while the nodes only have privates. Access to the Master node can be restricted by enabling Master Authorized Networks, which only allows specific IPs to connect to the Master node. This option is a good compromise to get adequate security without having to use a dedicated connection to the Kubernetes API like mentioned above.

Private GKE Clusters with a public endpoint

In the setup we define for this article, we choose to use a third option. We wrote Terraform code to create the private clusters with a public endpoint for the Master node. It is then protected by Master Authorized Networks where we only allow known IPs to connect to the cluster.

This doesn't replace authentication to the cluster. Each user is still required to authenticate in order to use the Kubernetes API running on the Master node. Access management is still done using GCP IAM roles, which are translated to pre-defined RBAC in Kubernetes. For more granularity, using Kubernetes RBAC is necessary.

Networks and firewall rules

A VPC-native Cluster is required to configure our desired network components.

Defining private IP ranges

We first need to define private IP ranges for the nodes, services and pods. To do so, we define a subnetwork in our VPC network. A subnet in Google Cloud has a primary IP range, and optional secondary ranges (IP alias ranges). We will create 2 secondary ranges within our subnet. GKE will assign nodes IP from the primary IP range, and will use the 2 secondary ranges for Pods and Services respectively.

In addition, we create firewall rules to allow internal traffic within the subnetwork. In Google Cloud, we can use service accounts as a source and destination parameter of a firewall rule.

Google Health Check

It is recommended to enable Google Private Access, so Google services are reachable without going over the public internet. However, we still need to allow Google Health-check to verify the nodes status. A firewall ingress rule must be created to allow Google Health-checks (currently, the IP ranges for this are 130.211.0.0/22 and 35.191.0.0/16) 

For the nodes to be able to receive traffic from the internet, which is not possible by default due to the lack of an external IP address, we need to define a Cloud NAT in front of our nodes. The NAT provides a public IP in front of all nodes. Ingress traffic is denied, which protects the nodes, but the requests coming from some workloads running on the nodes will be allowed.

A firewall being stateful, if a firewall egress rule allows an outgoing request, the response is always allowed by the firewall as well, independently of the ingress rules in places.

We then create a Cloud Router to redirect the traffic to our network. 

Harden your Google Kubernetes Engine cluster with Terraform and Istio 1

Terraform code

The Terraform resource google_container_cluster describes the parameters to configure a GKE cluster. To configure a GKE Private cluster with Master Authorized Network enabled, we need to configure the private_cluster_config, master_authorized_networks_config and ip_allocation_policy (optional) fields. By default, GKE will reserve private IP ranges for the nodes, pods and services from the VPC network. However, it is better to optimise the IP address allocation based on the (future) needs.

Because we want to manage the configuration of each IP ranges used by the Master Authorized Networks configuration, we use dynamic blocks to keep the code DRY. We also define local variables to combine the IP created by our Terraform setup (e.g, Spinnaker IP, Cluster IP) with the ones from our configuration.

Sample from the main configuration files, like mgmt.tfvars, dev.tfvars, etc:

resource "google_container_cluster" "gke_environment" {

...
master_authorized_network_config = {
  cidr_blocks = [
    {
      display_name = "office",
      cidr_block = "100.110.120.130/32"
    }
  ]
...
}


This means we want this IP range, which can only contain a unique IP, to be authorised in the corresponding cluster.

Code sample for the GKE cluster definition, in the gke module:

locals {
cidr_blocks = concat(var.master_authorized_network_config.cidr_blocks,
[
  {
    display_name : "GKE Cluster CIDR",
    cidr_block : format("%s/32", google_compute_address.nat_ip.address)
  },
  {
    display_name: "Spinnaker CIDR",
    cidr_block : format("%s/32", var.spinnaker_ip)
  }
]
)
}
resource "google_container_cluster" "gke_environment" {
...
ip_allocation_policy {
  cluster_secondary_range_name  = google_compute_subnetwork.vpc_native_cluster.secondary_ip_range.0.range_name

  services_secondary_range_name = google_compute_subnetwork.vpc_native_cluster.secondary_ip_range.1.range_name
}
private_cluster_config {
  enable_private_nodes = true
  master_ipv4_cidr_block = var.nwr_master_node # must be /28
  enable_private_endpoint = false
}
master_authorized_networks_config {
  dynamic "cidr_blocks" {
    for_each = [for cidr_block in local.cidr_blocks: {
      display_name = cidr_block.cidr_block
      cidr_block = cidr_block.cidr_block
    }]
    content {
      cidr_block = cidr_blocks.value.cidr_block
      display_name = cidr_blocks.value.display_name

    }
  }
}
...
}

 

Istio Service Mesh

istio

Istio is a very complex tool with a lot of features. It would take too long to explain all the possibilities Istio has to offer here. To learn more about what is istio and why to use it, we recommend reading their website and running the provided samples.

The Istio architecture is composed of istiod, which is part of the control plane, and sidecar container called envoy-proxy.

Harden your Google Kubernetes Engine cluster with Terraform and Istio 2

The sidecar proxy runs in a second container within the same pod as the application. It handles network traffic and enables additional security features such as Mutual Authentication (mTLS), which is enabled by default on Istio version 1.5+.

Customisable Istio installation with Terraform on GKE

There are multiple ways to install Istio on GKE. The simplest one is to use the default Istio installation managed by GKE, which can simply be turned on from the console. This method however doesn't let us customise our installation.

Up until Istio version 1.5, it was possible to use HELM to install Istio. However, this method is deprecated and not recommended anymore.

Recommended installation of Istio

Starting from Istio version 1.6, the recommended way to install Istio is to use istioctl, which packages Istio Custom Resources Definitions and the Istio Operator to install the Istio Kubernetes Resources such as Gateway, Virtual Service, etc. This solution lets us fully customise the Istio installation, and makes it easy to update it as the needs evolve over time. It also comes with default Istio profiles, which are pre-defined Istio installation configurations, which can be used as a base for custom ones.

In general, it is possible to define custom resources, using Custom Resource Definition and Operators in Kubernetes

Because istioctl is a CLI, we can use the official docker image to install Istio based on a configuration we define. We can integrate the installation in our Terraform setup, and add additional resources to automatically make istio available.

Customisable Istio installation on our clusters

Our goal is to have a customisable Istio installation on our clusters, where we can securely access Kiali and Grafana, which is connected to Prometheus. We also require all the traffic to go through the Istio ingress gateway, and not directly to the applications within the service mesh.

What we need to do:

  • Define the credentials used by our installation process
  • Define the Istio configuration
  • Install Istio based on the configuration
  • Expose Kiali and Grafana with a Global HTTPS Load balancer

We first add an istio module in our Terraform setup. We can then define a Google service account with the roles/containers.Admin role, as this permission is necessary to install all the Istio resources in the cluster. We then map this GCP service account to a Kubernetes service account using Workload Identity (see third section to know more about Workload Identity).

Then we define a custom profile (istio-config.yaml) based on a pre-defined profile. We can list the predefined profiles by running

istioctl profile list

and then dumping its content (here default profile)

istioctl profile dump default

This is the easiest way to customise a profile. For example, this configuration file will enable the `istio-ingressgateway` and add the components Grafana, Prometheus, Kiali and Tracing (Jaeger) 

$ cat istio-config.yaml

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
namespace: istio-system
name: istio-control-plane
spec:
profile: default
values:
  pilot:
    traceSampling: 0.1
  tracing:
    service:
      type: NodePort
  meshConfig:
    outboundTrafficPolicy:
      mode: ALLOW_ANY  # https://istio.io/docs/tasks/traffic-management/egress/egress-gateway/
components:
  egressGateways:
    - enabled: true
      name: istio-egressgateway
  ingressGateways:
    - enabled: true
      name: istio-ingressgateway

      k8s:
        hpaSpec:
          minReplicas: 1
          maxReplicas: 5
        service:
          type: NodePort
          ports:
            - name: status-port
              port: 15021
              targetPort: 15021
            - name: http2
              port: 8080
              targetPort: 8080
            - name: tls
              port: 15443
              targetPort: 15443
            - name: tcp
              port: 31400
              targetPort: 31400
            - name: https
              port: 443
              targetPort: 443
addonComponents:
  grafana:
    enabled: true
    k8s:
      service:
        type: NodePort
  prometheus:
    enabled: true
    k8s:
      service:
        type: NodePort
  kiali:
    enabled: true
    k8s:
      service:
        type: NodePort
  tracing:
    enabled: true


Installing Istio

We are now ready to install Istio. To do so, we are going to run a Kubernetes Job (kubernetes_job Terraform resource)  with the proper service account and configuration mounted as a ConfigMap. The job will run the installation commands for us. We use multiple init_container blocks to download istioctl, enable the credentials, and finally install istio using the provided configuration. The Job, deployed using Terraform, will run the installation command 

istioctl install -f istio-config.yaml


Defining additional custom resources

Finally, we also define additional custom resources. Such resources cannot be created using the Kubernetes Terraform Provider at the time of this writing. Instead, we are creating Kubernetes Jobs which will run the kubectl apply on the custom resources.

The Kubernetes-alpha Terraform provider can apply any manifest, but it is not suitable for production at the time of this writing.

The required Custom Resources Definitions (CRD) are Gateway, Virtual Service (Istio Custom Resource) and ManagedCertificate (GKE Custom Resource). They are used to correctly setup a Kubernetes Ingress with HTTPS using Google Managed Certificates and route the traffic from the Load balancer directly to the Istio backend created by Istio. This makes sure all the traffic is picked up by Istio and redirected properly to the service mesh using Virtual Services. Over time, our application will have more services and we will need to create more Virtual Service to redirect the traffic properly.

To use Google-Managed Certificates, it is important to register the IP of the load balancer within a DNS service, otherwise Google won't be able to validate the hostname and the Load balancer will not work. Also, Google-managed certificates currently don't support wildcard hostnames, such as *.example.com

Securing Workloads

In addition to the security measures in place, we can further increase the security of the cluster and workloads by using:

  • Workload Identity (GKE only)
  • Pod Security Policies
  • Network Policies

In this section, we explain what they are and how to set it up with Terraform.

Workload Identity

Workload Identity is a GKE only feature that fully manages credentials for applications accessing GCP APIs. Without Workload Identity, the best way to provision credentials to an application running in a Pod is to mount the credentials as a Kubernetes secret. In this scenario, the DevOps team must take care of replacing the secrets when keys are rotated. This creates a lot of complexity and security risks.

Workload Identity solves this problem by internally mapping GCP service accounts to Kubernetes service accounts used by the pods. It is then possible to grant IAM roles the application needs to the GCP service accounts, and the pod will automatically have the right credentials as default in the pod which the application can use.

With Workload Identity, there is no need to manage credentials and mount Kubernetes secrets. This is automatically taken care of by Workload Identity, including key rotations.

Enabling Workload Identity requires a few steps, but can be integrated in the Terraform setup.

1. Enable Workload Identity in the cluster resource for a given project, and enable GKE_METADATA_SERVER on the node pool


resource "google_container_cluster" "gke_environment" {
  ...
  workload_identity_config {
    identity_namespace = "${var.project_id}.svc.id.goog"
  }
}

resource "google_container_node_pool" "auto_scaling" {
    ...
    workload_metadata_config {
      node_metadata = "GKE_METADATA_SERVER"
    }
  }
}


2.
Create a GCP service account and a Kubernetes service account for each application requiring access to GCP APIs, like we did when we installed Istio.

resource "google_service_account" "istio_install" {
  account_id = "istio-install"
}
resource "kubernetes_service_account" "istio_install" {
metadata {
  name = google_service_account.istio_install.account_id
  namespace = kubernetes_namespace.istio_install.metadata.0.name
  annotations = {
    "iam.gke.io/gcp-service-account" = google_service_account.istio_install.email
  }
}
}


3. Grant the necessary IAM roles for the application. In addition, we need to grant the GCP service account the roles/iam.workloadIdentityUser and set the member argument with a specific format, otherwise it will not work.

resource "google_project_iam_member" "istio_install_container_admin" {
  role   = "roles/container.admin"
  member = "serviceAccount:${google_service_account.istio_install.email}"
}

resource "google_service_account_iam_member" "istio_install_workload_identity_user" {
  member = "serviceAccount:${var.project_name}.svc.id.goog[${kubernetes_namespace.istio_install.metadata.0.name}/${kubernetes_service_account.istio_install.metadata.0.name}]"
  role = "roles/iam.workloadIdentityUser"
  service_account_id = google_service_account.istio_install.name
}

 

4. Finally, we need to specify the Kubernetes service account name in the pod specification (serviceAccountName argument)

When the pod is deployed, the default GCP credentials will be available and will be mapped to the GCP service account we defined, with the proper roles.

Network policies

Network policies let us define policies to control traffic flows between pods at Network layer (OSI layer 3/4). With network policies, we can define which pod is allowed to communicate with other pods. It can also be used to limit communication between pods in different namespaces.

Istio can also be used to define Istio network policies. They are not meant to replace the native Kubernetes Network Policies, but rather complement them, as they work at service level (OSI layer 7). They wrote an article on this topic which explains when and how to use both.

Network policies must be enabled on the cluster. This can easily be done with Terraform by enabling it in the cluster configuration:

resource "google_container_cluster" "gke_environment" {
...
  addons_config {
    network_policy_config {
      disabled = false
    }
  }
  network_policy {
    enabled = true
    provider = "CALICO"
  }
}

 

By default, if no policies are specified, pods can receive and send traffic from any source to any destination. Once a network policy is defined, pods will reject the traffic if not allowed by the policy.

Network policies can be defined in Terraform using the kubernetes_network_policy resource

Pod security policies

Pod security policies let you specify a set of conditions a pod must satisfy to be accepted by the cluster. For example, we can define a policy which prevents a pod from running if the application runs as root. A policy is assigned to some users: one user might have the permission to create pods running as root, while other users can't.

Once pod security policies are enabled on the cluster, no actions are allowed by default. This means we need to first define at least one policy in order to be able to deploy some workloads. A recommended restrictive policy is available to download. It requires pods to run as non-root, blocks possible escalations to root, and requires use of several security mechanisms.

Some components, like Istio, might require additional permissions such as network permissions. We can create an additional policy which allows those and assign them to the service account used by the Istio installation.

Once a policy is applied to the cluster, we can bind them to users using Kubernetes Cluster Role Bindings.

Again, Terraform can be used to easily set up Pod security policies for us with kubernetes_pod_security_policy resource.

1. Define the kubernetes_pod_security_policy. In this case, we are using the recommended policy from above.


resource "kubernetes_pod_security_policy" "restricted" {
  ... # all fields from the resource specification (omitted)
}


2. Grant permission to all authenticated users and service accounts to use this policy. This is done with a
kubernetes_cluster_role_binding:

resource "kubernetes_cluster_role" "pod_security_policies" {
  metadata {
    name = "pod-security-policies-role"
  }
  rule {
    api_groups = ["policy"]
    resources  = ["podsecuritypolicies"]
    verbs      = ["use"]
    resource_names = [
      kubernetes_pod_security_policy.restricted.metadata.0.name
    ]
  }
}

resource "kubernetes_cluster_role_binding" "pod_security_policies" {
  metadata {
    name = "pod-security-policies-cluster-role-binding"
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind = "ClusterRole"
    name = kubernetes_cluster_role.pod_security_policies.metadata.0.name
  }
  subject {
    kind = "Group"
    name = "system:serviceaccounts" # all service accounts in all namespace
    api_group = "rbac.authorization.k8s.io"
  }
  subject {
    kind = "Group"
    name = "system:authenticated"
    api_group = "rbac.authorization.k8s.io"
  }
}


3. Now that we defined at least one PodSecurityPolicy and requested all authenticated users/service accounts to use this policy, we can enable the pod security policies on the cluster. This will immediately take effect.

resource "google_container_cluster" "gke_environment" {
  ...
  pod_security_policy_config {
    enabled = true
  }
}


Then we can apply the terraform code managing the cluster to create/update the configuration.

Best practices

To end this article, we are listing below a set of best practices to follow when developing with Kubernetes and on GKE. Some concepts are generic and can be applied to other cloud providers as well. They are regrouped in 4 categories.

1. GKE cluster security

ABAC is disabled by default on GKE 1.10+
  • Use a private cluster. Enable Master Authorized Networks if the master node has a public IP.
  • Don't use the default service accounts on nodes
  • Enable node auto-upgrade in the development environment
  • Install and configure Istio
  • Enable Workload Identity

2. Service accounts and Permissions

  • Use the principle of least-privilege when assigning roles to service accounts (GCP and Kubernetes)
  • Revoke default IAM permissions in GCP
  • Rotate credentials at regular intervals or if compromised. Workload Identity can help

3. Monitoring

  • Enable Audit logs for all relevant APIs
  • Install Prometheus (also part of Istio)
  • Enable tracing with Istio
  • Export Stackdriver and Prometheus logs into an external solution like BigQuery. This reduces the cost and let us keep the logs longer than 30 days
  • Use Kubernetes Engine Monitoring
  • Setup monitoring dashboard and alerts. Grafana can be used in addition to Stackdriver
  • Enable VPC flow logs on a fraction (2%) of traffic

4. Kubernetes

We hope you found this article about how to harden your Google Kubernetes Engine (GKE) cluster with Terraform and Istio useful! 

This article is Written by Kevin Serrano, Google Cloud engineer at Devoteam G Cloud & CTO at Nessie


Want to know more about Google Cloud Platform? 

More about Google Cloud Platform

Need help setting up your own GKE clusters in your organisation, or do you need any other technical guidance? Our experts are there to help!

Contact Us

Comments

About the Devoteam G Cloud Blog

Updates & insights on all things Google Cloud: G Suite, Google Cloud Platform & Chrome - carefully crafted by the Devoteam G Cloud experts. More about us →

Subscribe to Updates