I have been operating a Kubernetes cluster for a few months now. This website runs on an 8-node Kubernetes cluster composed of:
The cluster operates with a mix of arm64 and amd64 architectures, which means all container images must be multi-architecture compatible. It's deployed on Talos, a modern, API-driven Linux distribution designed specifically for Kubernetes, with no traditional SSH access. Talos approach ensures a secure, streamlined infrastructure, managed entirely via API. Inspired by the Datavirke blog, the setup follows a similar design but incorporates several unique elements.
As a strong proponent of GitOps, I store the entire configuration of the cluster in Git and deploy it from there. There's a lot more to this approach, which I plan to cover in future posts. For now, here’s an overview of the core technologies being used in this setup:
I was running Talos v1.7.5 and Kubernetes 1.30.2 when I noticed a new release: Talos v1.8.0 and Kubernetes v1.31.1. Given the redundancy in my multi-node cluster, I was confident that even if any issues arose during the upgrade, my website and other services would remain operational. I followed a staged upgrade approach:
I used the following command to generate the upgrade command based on my cluster configuration:
talhelper gencommand upgrade
This command generated the necessary steps tailored to my setup, ensuring that I didn't overlook anything, particularly since I am using custom builds. I copy-pasted the command into a script and started upgrading the first node. The process was smooth, and within an hour, my entire cluster was running Talos v1.8.0. Once the cluster upgrade was completed, I also updated my local talosctl
and talhelper
binaries.
After upgrading Talos, the next step was upgrading Kubernetes.
talhelper gencommand upgrade
This command acts as a simple wrapper for talosctl upgrade-k8s
. It only requires one node to initiate the upgrade process, after which it uses Talos discovery to locate all other nodes and upgrade them in sequence. The upgrade to Kubernetes v1.31.1 took approximately 20-30 minutes to complete for the entire cluster, and the process was extremely smooth and painless.
One of the most impressive aspects of the upgrade was that none of the applications running in the cluster experienced any downtime. However, after the upgrade, FluxCD stopped functioning. Upon investigation, I discovered that Flux v2.3.0 did not support Kubernetes v1.31.1, but fortunately, a v2.4.0 release was scheduled for the end of September. I waited a few days for the upgrade and then performed the following steps to upgrade Flux to v2.4.0:
flux install --export > ./clusters/my-cluster/flux-system/gotk-components.yaml
kubectl apply --server-side --force-conflicts -f ./clusters/my-cluster/flux-system/gotk-components.yaml
flux reconcile ks flux-system --with-source
After upgrading Flux, everything started operating smoothly again.
Upgrading both Talos and Kubernetes was a smooth experience overall. However, a key takeaway from this process is the importance of verifying the compatibility of all components, like FluxCD, before proceeding with a Kubernetes upgrade. In my case, it would have been better to wait for FluxCD v2.4.0 to avoid the temporary downtime, but given the acceptable risk, everything worked out fine in the end.