Run database migrations using native Kubernetes

October 14th, 2022

At work, I recently had to migrate the database of a project from a managed NoSQL solution to PostgreSQL. Besides our data model changing entirely (thank you JSONB), the second biggest change was to handle database schema migrations properly.

There are many valid solutions to that problem. One may choose to let their Continuous Deployment (CD) system run the migrations. It is also possible to let the application itself run them. In any case, the two only rules are:

  1. The migration need to be run before the application is starting

  2. If the migration fails, new version of the application must not be deployed

In our case, we decided to go with the path of least resistance: our CD did not have credentials to our database cluster, but the Kubernetes cluster of our application did. We then decided to go with native Kubernetes constructs to move fast, or so we thought...

#

Using initContainers

A big part of what Kubernetes does is to schedule and run Docker containers. To do so, it wraps them within Pods which are the smallest components that can be scheduled. A Pod is not a simple Docker container though. It is a wrapper around one or more containers. A useful use case for running two containers inside a single Pod is to use one of them as a sidecar to your application, to forward logs for instance.

While multiple containers may work concurrently within the same Pod, it is not possible to control their order of execution. This is where initContainers comes to the rescue. By declaring initContainers, you can run Docker instances to execute tasks which need to complete successfully before the main containers. The only change we had to make was to bundle our migration directory within our Docker production image.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Values.appName }}
spec:
  replicas: 3
  template:
    spec:
      initContainers:
        - name: db-migration
          image: {{ .Values.image }}
          command:
            - npx
            - prisma
            - migrate
            - deploy
          env:
            - name: "DATABASE_URL"
              value: {{ .Values.databaseUrl }}
          # ...
      containers:
        - name: app
          image: {{ .Values.image }}
          env:
            - name: "DATABASE_URL"
              value: {{ .Values.databaseUrl }}
          # ...

By doing so, each new Pod will first attempt to run migrations. The first one applying them will succeed whereas the other ones will either wait for the database migration lock or do nothing since migrations were already applied.

This should be a solution for 99% of database migration usecases. Sadly, because we ran into a specific limitation of Prisma, our ORM and migration tool, we were not able to execute concurrent migrations from different pods. From there, we had to find a solution to execute the migration from a single place.

#

Using a dedicated Job with Helm Hooks

Because our project was described as an Helm Chart, we discovered from the Helm documentation that it is possible to trigger the execution of a Kubernetes Job before the deployment is actually made.

All we had to do was to extract the initContainer inside a dedicated Kubernetes Job resource, and add the "helm.sh/hook": pre-install,pre-upgrade annotation to it.

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    "helm.sh/hook": pre-install,pre-upgrade
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: db-migration
          image: {{ .Values.image }}
          command:
            - npx
            - prisma
            - migrate
            - deploy
          env:
            - name: DATABASE_URL
              value: {{ .Values.databaseUrl }}
          # ...

And voilà. Every time our project was deployed with helm upgrade, that job would be created and our pods would get redeployed... But again no.

Because it quacks like duck does not mean it's a duck. In our case, our Kubernetes files were declared within a Helm Chart, but we were only using Helm for its templating capabilities. Our deployment is executing kubectl apply -f but not helm upgrade under the hood of our internal deployment CLI.

But limitations foster creativity right?

#

Using a dedicated Job with initContainers

At this point, we knew we needed to use native Kubernetes constructs, and to run migrations from a single place.

Our final approach was the right one: use a combination of a single Job to run the migration, and to use initContainers in our project Deployment to make our Pods' main containers wait for the Job to be done.

As of October 2022, if you ever need this, there is a MIT licensed open-source project named k8s-wait-for maintained by GitHub user groundnuty. However, we could not use it easily because we could only use Docker images from our private Docker registry for security reasons. Depending on a relatively new and untrusted Docker image was not an option for us.

Because we only needed to wait for a Kubernetes Job, implementing a non-generic version of it was doable without too much of a hassle. All we had to do was to create a new ServiceAccount with the right permissions to access the Kubernetes API to introspect , and to request the right endpoint with an authenticated call.

Long story short, it looked like this.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: wait-for-migration
  namespace: {{ .Values.namespace }}

---

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: wait-for-migration
  namespace: {{ .Values.namespace }}
rules:
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["get", "list", "watch"]

---

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: wait-for-migration
  namespace: {{ .Values.namespace }}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: wait-for-migration
subjects:
  - kind: ServiceAccount
    name: wait-for-migration
    namespace: {{ .Values.namespace }}

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Values.appName }}
spec:
  replicas: 3
  template:
    spec:
      initContainers:
        - name: db-migration
          image: {{ .Values.image }}
          command: [ "/bin/bash" ]
          args:
            - "-c"
            - |
              SERVICE_ACCOUNT="/var/run/secrets/kubernetes.io/serviceaccount"
              CACERT="${SERVICE_ACCOUNT}/ca.crt"
              TOKEN="$(cat ${SERVICE_ACCOUNT}/token)"
              URL="https://{{ .Values.cluster }}/apis/batch/v1/namespaces/{{ .Values.namespace }}/jobs/db-migration"

              for i in `seq 60`; do
                sleep 1
                STATUS=$(curl --cacert ${CACERT}                        \
                              --header "Authorization: Bearer ${TOKEN}" \
                              --silent                                  \
                              ${URL} | jq --raw-output '.status.conditions[0].type')
                [ "${STATUS}" = "Complete" ] && exit 0
              done
              exit 1
      containers:
        - name: app
          image: {{ .Values.image }}
          env:
            - name: "DATABASE_URL"
              value: {{ .Values.databaseUrl }}
          # ...
#

Conclusion

Even though we would have preferred that our original Prisma approach worked and It Took More Time Than Expected™️, we are quite happy with that compromise. Our database migration process only relies on native Kubernetes constructs, which makes it easier to understand and leaves our team in total control over what is happening. Also, I deeply think that the journey is more important than the destination. Along the way, our team learned about initContainers, Helm, how to make authenticated calls to the Kubernetes API, RBAC authorization, and to test all of that, we learned how to run a local Kubernetes cluster using k3d.io. As with many things, learning is failing, and we did learn for this migration!