Home

GKE Cluster Setup with CDK for Terraform

By Daniel Schmidt
Published in CDK for Terraform
June 08, 2021
4 min read
GKE Cluster Setup with CDK for Terraform

Building your application layer on top of Kubernetes can be tough, you need to:

  • Get a cluster
  • Get platform wide services like CertManager, Istio, Grafana, Prometheus, etc set up
  • Get developers educated on how to get their service deployed
  • Ensure everyone is following Kubernetes best practices
  • Glue the tool to deploy the server to the workload deployment tools (helm or plain kubectl)

Let’s try to simplify this by using the CDK for Terraform. I chose Typescript as it’s the language I’m most familiar with, but one could use other languages like Go, Python, Java, or C# as well here. If you want to read the full code, please see DanielMSchmidt/blog-kubernetes-infrastructure-with-cdk.

The Cluster

First of all, we need to have a GKE cluster set up. We will do this in a separate TerraformStack so that the Terraform state is separated from our applications state. I use a function here to set up the Google provider as we will need this in the other stacks as well.

import {
  ContainerCluster,
  ContainerNodePool,
  ContainerRegistry,
  DataGoogleContainerCluster,
  GoogleProvider,
  ProjectIamMember,
  ServiceAccount,
} from "@cdktf/provider-google";
import { CLUSTER_NAME } from "./config";

const oauthScopes = [
  "https://www.googleapis.com/auth/devstorage.read_only",
  "https://www.googleapis.com/auth/logging.write",
  "https://www.googleapis.com/auth/monitoring",
  "https://www.googleapis.com/auth/servicecontrol",
  "https://www.googleapis.com/auth/service.management.readonly",
  "https://www.googleapis.com/auth/trace.append",
  "https://www.googleapis.com/auth/cloud-platform",
];

function useGoogle(scope: Construct) {
  new GoogleProvider(scope, "providers", {
    zone: "us-central1-c",
    project: "dschmidt-cdk-test",
  });
}

class InfrastructureLayer extends TerraformStack {
  constructor(scope: Construct, name: string) {
    super(scope, name);

    useGoogle(this);

    const sa = new ServiceAccount(this, "sa", {
      accountId: "cluster-admin",
      displayName: "Cluster Admin",
    });

    const pushSa = new ServiceAccount(this, "registry-push", {
      accountId: "registry-push",
      displayName: "RegistryPush",
    });

    new ProjectIamMember(this, "sa-role-binding", {
      role: "roles/storage.admin",
      member: `serviceAccount:${sa.email}`,
    });

    new ProjectIamMember(this, "push-role-binding", {
      role: "roles/storage.admin",
      member: `serviceAccount:${pushSa.email}`,
    });

    new ContainerRegistry(this, "registry", {});

    const cluster = new ContainerCluster(this, "cluster", {
      name: CLUSTER_NAME,
      removeDefaultNodePool: true,
      initialNodeCount: 1,
      nodeConfig: [
        {
          preemptible: true,
          serviceAccount: sa.email,
          oauthScopes,
        },
      ],
    });

    new ContainerNodePool(this, "main-pool", {
      name: "main",
      cluster: cluster.name,
      nodeCount: 3,
      nodeConfig: [
        {
          preemptible: true,
          machineType: "e2-medium",
          serviceAccount: sa.email,
          oauthScopes,
        },
      ],
    });

    new ContainerNodePool(this, "workload-pool", {
      name: "workload",
      cluster: cluster.name,
      autoscaling: [
        {
          minNodeCount: 1,
          maxNodeCount: 10,
        },
      ],
      nodeConfig: [
        {
          preemptible: true,
          machineType: "e2-medium",
          serviceAccount: sa.email,
          oauthScopes,
        },
      ],
    });
  }
}

const app = new App();
new InfrastructureLayer(app, "infrastructure");
app.synth();

Besides the cluster, I also create a separate service account that we are going to use to push docker images to the Google Container Registry. I configure this cluster with two node pools. The first one is fixed in size for the baseline infrastructure, the second one has autoscaling enabled. The cluster we create looks like this:

Google Cloud Console view of the GKE Cluster
Google Cloud Console view of the GKE Cluster

This can easily be done in Terraform directly, so the only advantage we have gained so far is that we write the configuration in another language. This in itself is a small win, it enables teammates that are not familiar with HCL to understand a bit better what’s going on. If they use a modern editor they get tooltips that link to the Terraform documentation like this one:

VSCode tooltip with Terraform Docs link
VSCode tooltip with Terraform Docs link

Accessing the cluster

With the cluster ready we want to access it to deploy our workloads. To make this super easy I pulled the configuration needed into a separate function so that anyone who wants to create a new stack for their application can just write useCluster(CLUSTER_NAME) and run everything else in the context of that cluster.

So far we only used the pre-build provider for google that you can download directly through the package management tool of your language. Now we need to tap into the power of Terraform and get us some of the thousands of providers and modules. In this case, we are going to use the Kubernetes and Helm Provider as well as the GKEAuth module to authenticate against the GKE Cluster.

To be able to use these providers and modules we need to configure them in our cdktf.json:

{
  "language": "typescript",
  "app": "npm run --silent compile && node main.js",
  "terraformProviders": [
    "hashicorp/helm@ ~> 2.1.2"
    "hashicorp/kubernetes@ ~> 2.2.0",
    "hashicorp/local@ ~> 2.1.0",

    // we will come to these a little later
    "kreuzwerker/docker@ ~> 2.11.0",
    "hashicorp/null@ ~> 3.1.0", 
  ],
  "terraformModules": ["terraform-google-modules/kubernetes-engine/google//modules/auth@ ~> 14.3.0"],
  "context": {
    "excludeStackIdFromLogicalIds": "true",
    "allowSepCharsInLogicalIds": "true"
  }
}

With this in place, we can run cdktf get that generates the bindings we can access under ./.gen. The folder might change depending on the language you are using.

The useCluster function uses a DataSource to get information around the GKE cluster. We pass the information down to the GKEAuth which outputs we use to write a kubeconfig.yaml (so that we can take a look at the cluster locally) and most importantly we configure the Kubernetes Provider.

function useGoogle(scope: Construct) {
  new GoogleProvider(scope, "providers", {
    zone: "us-central1-c",
    project: "dschmidt-cdk-test",
  });
}

function useCluster(scope: Construct, name: string) {
  useGoogle(scope);
  const cluster = new DataGoogleContainerCluster(scope, "cluster", {
    name,
  });

  const auth = new GKEAuth(scope, "auth", {
    clusterName: cluster.name,
    location: cluster.location,
    projectId: cluster.project,
  });

  new File(scope, "kubeconfig", {
    filename: path.resolve(__dirname, "../kubeconfig.yaml"),
    content: auth.kubeconfigRawOutput,
  });

  new KubernetesProvider(scope, "kubernetes", {
    clusterCaCertificate: auth.clusterCaCertificateOutput,
    host: auth.hostOutput,
    token: auth.tokenOutput,
  });

  return auth;
}

Returning the authentication allows us to configure the Helm provider we will need to create our fundamental workloads.

Baseline Setup

To make the cluster production-ready we need some software that provides fundamental services for our applications to use. Secret management, service mesh capabilities, user authentication, there are lots of topics that could be handled once for the entire cluster. I will only deploy CertManager as an example of how to deploy helm charts.

import { CLUSTER_NAME } from "./config";
import { HelmProvider } from "./.gen/providers/helm/helm-provider";
import { Release } from "./.gen/providers/helm/release";

class BaselineLayer extends TerraformStack {
  constructor(scope: Construct, name: string) {
    super(scope, name);

    const auth = useCluster(this, CLUSTER_NAME);

    new HelmProvider(this, "helm", {
      kubernetes: [
        {
          clusterCaCertificate: auth.clusterCaCertificateOutput,
          host: auth.hostOutput,
          token: auth.tokenOutput,
        },
      ],
    });

    new Release(this, "cert-manager", {
      name: "cert-manager",
      repository: "https://charts.jetstack.io",
      chart: "cert-manager",
      createNamespace: true,
      namespace: "cert-manager",
      version: "v1.3.1",
    });
  }
}

The Application

So far we made use of a lot of Terraform functionality, but we didn’t tap into the power of the Terraform CDK yet. In our project setup, we have a few services in the ../services folder that need to be deployed. We don’t want to force anyone to learn much about Kubernetes or Docker to get their application deployed. We also want to ensure folks are following best practices, so let’s see how this can be done:

import * as fs from "fs";
import * as path from "path";
import { DockerProvider } from "./.gen/providers/docker/docker-provider";
import { Namespace } from "./.gen/providers/kubernetes/namespace";
import { application } from "./services";

class ApplicationLayer extends TerraformStack {
  constructor(scope: Construct, name: string) {
    super(scope, name);
    useCluster(this, CLUSTER_NAME);

    new DockerProvider(this, "docker", {});

    const ns = new Namespace(this, "ns", {
      metadata: [
        {
          name,
        },
      ],
    });

    const servicePath = path.resolve(__dirname, "../services");
    fs.readdirSync(servicePath).forEach((p) =>
      application(this, path.resolve(servicePath, p), ns)
    );
  }
}

const app = new App();
new InfrastructureLayer(app, "infrastructure");
new BaselineLayer(app, "baseline");
new ApplicationLayer(app, "development");
new ApplicationLayer(app, "staging");
new ApplicationLayer(app, "production");
app.synth();

We create one of these application layers per environment we want to deploy, each of these layers deploys their namespace, iterates through the service folders, and deploys each of these applications using our application function.

We want to support Node and Rust projects out of the box, so we put standard Dockerfiles in place. In getDockerfileFlag you can see that we compose the docker build -f command to either uses the default (the Dockerfile inside the context folder) or the standard Dockerfile.

We get the version of the project in the getVersion function so that we can compose the tag for the docker image.

As you can see we can act based on local files so we can shape the interface our devs need to interact with however we like. Provision a Postgres or Redis instance by setting a value in the package.json? Sure, let’s do it. Write a module that defines which env vars to pass from where and at the same time delivers these in the application context? No problem. Your devs want more flexibility and want to write their CDK code right local to their application? As long as you stay in the same language there is no problem.

Remember the service account we set up to push images? We are going to use that now to generate ServiceAccountKeys. Armed with the key we use a Null Resource and the local-exec it provides to run a docker login, build, push chain that gets our image pushed into GCR.

Once the image is ready Terraform will create the Deployment for us and set a Service up according to our definition. This depends on the image being built, therefore we need to pass it into the dependsOn array.

import { ITerraformDependable, TerraformAsset } from "cdktf";
import { Construct } from "constructs";
import * as fs from "fs";
import * as path from "path";
import { Deployment } from "./.gen/providers/kubernetes/deployment";
import { Service } from "./.gen/providers/kubernetes/service";
import { VERSION, DOCKER_ORG } from "./config";
import { Resource } from "./.gen/providers/null/resource";
import { Namespace } from "./.gen/providers/kubernetes/namespace";
import {
  DataGoogleServiceAccount,
  ServiceAccountKey,
} from "@cdktf/provider-google";


export function application(scope: Construct, p: string, ns: Namespace) {
  const name = path.basename(p);
  const [image, resource] = buildAndPushImage(scope, name, p);
  service(scope, name, image, ns, [ns, resource]);
}

function buildAndPushImage(
  scope: Construct,
  imageName: string,
  p: string
): [string, Resource] {
  const _ = (name: string) => `${imageName}-${name}`;
  const files = fs.readdirSync(p);

  function getDockerfileFlag() {
    if (files.includes("Dockerfile")) {
      return "";
    }

    if (files.includes("package.json")) {
      const asset = new TerraformAsset(scope, _("node-dockerfile"), {
        path: path.resolve(__dirname, "Dockerfile.node"),
      });

      return `-f ${asset.path}`;
    }

    if (files.includes("Cargo.toml")) {
      const asset = new TerraformAsset(scope, _("node-dockerfile"), {
        path: path.resolve(__dirname, "Dockerfile.rust"),
      });

      return `-f ${asset.path}`;
    }

    throw new Error(
      "Unknown application language, please add a Dockerfile or use node or rust"
    );
  }

  function getVersion(): string {
    if (files.includes("package.json")) {
      return require(path.resolve(p, "package.json")).version;
    }

    return VERSION;
  }

  const dockerfileFlag = getDockerfileFlag();
  const content = new TerraformAsset(scope, _("content"), {
    path: p,
  });

  const sa = new DataGoogleServiceAccount(scope, _("sa"), {
    accountId: "registry-push",
  });

  const key = new ServiceAccountKey(scope, _("sa-key"), {
    serviceAccountId: sa.email,
  });

  const version = getVersion();
  

  const tag = `gcr.io/${DOCKER_ORG}/${imageName}:${version}-${content.assetHash}`;
  const image = new Resource(scope, _("image"), {
    triggers: {
      tag,
    },
  });
  

  const cmd = `echo '${key.privateKey}' | base64 -D | docker login -u _json_key --password-stdin https://gcr.io && docker build ${dockerfileFlag} -t ${tag} ${content.path} && docker push ${tag}`;
  image.addOverride("provisioner.local-exec.command", cmd);

  return [tag, image];
}

function service(
  scope: Construct,
  image: string,
  imageTag: string,
  ns: Namespace,
  dependencies: ITerraformDependable[]
) {
  const labels = { application: image, deployedBy: "cdktf" };
  const deployment = new Deployment(scope, `${image}-deployment`, {
    dependsOn: dependencies,
    metadata: [
      {
        name: image,
        labels,
        namespace: ns.id,
      },
    ],
    spec: [
      {
        selector: [
          {
            matchLabels: labels,
          },
        ],
        template: [
          {
            metadata: [
              {
                labels,
              },
            ],
            spec: [
              {
                container: [
                  {
                    name: "application",
                    image: imageTag,
                    port: [{ containerPort: 80 }],
                    livenessProbe: [{
                      httpGet: [{
                        path: "/health",
                        port: "80",
                      }]
                    }]
                  },
                ],
              },
            ],
          },
        ],
      },
    ],
  });

  new Service(scope, `${image}-service`, {
    dependsOn: [...dependencies, deployment],
    metadata: [{ name: image, namespace: ns.id }],
    spec: [
      {
        selector: { application: image },
        port: [{ port: 80 }],
      },
    ],
  });
}

Running cdktf apply <stack> for development, staging and production deploys the workloads directly on the cluster.

GKE Workloads view
GKE Workloads view

Summary

We set out to

  • Get a cluster
  • Get platform wide services like CertManager, Istio, Grafana, Prometheus, etc set up
  • Get developers educated on how to get their service deployed
  • Ensure everyone is following Kubernetes best practices
  • Glue the tool to deploy the server to the workload deployment tools (helm or plain kubectl)

and we achieved all of these (to some degree). I like that the CDK allows you to find abstractions over infrastructure that fit your team. Which abstractions you choose depend on the company, the setup, and the familiarity of your developers with infrastructure. I am certain the abstraction I chose are not the right ones for your use case. Luckily it’s quite easy to change them and introduce new ones that work for you.

The last thing to mention is that in general infrastructure automation should be run inside of a CI (e.g. Google Cloud Build if you are working in GCP already), so that the access and execution environment can be controlled in one place.


Tags

#cdktf#docker#gke#google
Previous Article
Declarative Development Environments with CDK for Terraform

Daniel Schmidt

Senior Software Engineer

Topics

GraphQL
CDK for Terraform
React Native
Testing

Related Posts

Declarative Development Environments with CDK for Terraform
May 06, 2021
5 min
© 2021, All Rights Reserved.

Social Media

githubtwitter