Upgrading Kubernetes

Kubernetes, the leading container orchestration platform, evolves rapidly with new features, security patches, and performance improvements in each release. Upgrading your cluster is essential to stay current, but it can be daunting due to the potential for downtime, compatibility issues, or misconfigurations. In this detailed blog post, we’ll walk through upgrading a Kubernetes cluster from version 1.31 to 1.32 using a structured, step-by-step script. This minor version upgrade introduces enhancements like improved API stability and bug fixes, but requires careful planning to avoid disruptions.

We’ll explain each step in depth, including why it’s necessary, how to execute it, and common problems that might arise. For each issue, I’ll suggest possible solutions based on real-world experiences from Kubernetes administrators. By the end, you’ll have a solid understanding of the process, complete with code snippets and a Markdown-based flow diagram to visualize the workflow. This guide assumes you’re using a Debian-based system (like Ubuntu) with apt for package management, as it’s common in Kubernetes setups.

Why Upgrade Kubernetes? Understanding the Context

Before diving in, let’s discuss why upgrades matter. Kubernetes follows a semantic versioning scheme: major.minor.patch. A minor upgrade like 1.31 to 1.32 typically involves API deprecations, new features (e.g., better support for dynamic resource allocation in 1.32), and fixes for vulnerabilities. Skipping upgrades can lead to security risks or missing out on optimizations.

However, upgrades aren’t without risks. Control plane components (like etcd, API server) and node-level tools (kubelet, kubectl) must be updated in sequence to maintain cluster stability. Always back up your etcd data and test in a staging environment first. Prerequisites include:

A healthy cluster (check with kubectl get nodes and kubectl get pods --all-namespaces).
Sufficient resources (CPU, memory) on nodes.
Backups of persistent volumes and etcd snapshots.
Access to sudo and kubectl as cluster admin.
Familiarity with your cluster’s add-ons (e.g., CNI like Calico, which may need compatibility checks).

Now, let’s break down the upgrade process.

The Upgrade Process: Step by Step

This process focuses on a multi-node cluster with at least one control plane. Start with the first control plane node, then additional control planes, and finally worker nodes. Repeat steps for each node where applicable.

Step 1: Configure Package Repository for the New Kubernetes Version

The first step sets up the apt repository for Kubernetes 1.32 packages. This ensures your system can fetch the updated binaries.

Execution:

sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update

Explanation: This downloads the signing key for the 1.32 repository, adds the repo source, and updates the package list. It’s required for minor upgrades because each version has its own stable repository to isolate dependencies.

Potential Problems:

Network Issues: The curl command fails if there’s no internet access or firewall blocks.
GPG Key Errors: If gpg isn’t installed or there’s a key conflict.
Repository Conflicts: Existing sources.list might have overlapping entries, causing apt errors.
Permission Denied: Running without sudo.

Solutions:

For network issues, verify connectivity with ping pkgs.k8s.io or use a proxy if behind a corporate firewall. Retry after fixing.
Install gpg if missing: sudo apt install gnupg. If key conflicts, remove old keys: sudo rm /etc/apt/keyrings/kubernetes-apt-keyring.gpg and rerun.
Check for conflicts: grep kubernetes /etc/apt/sources.list.d/* and remove duplicates. Use apt policy to verify sources.
Always use sudo; if permission issues persist, check your user’s sudoers file.

Step 2: Find the Available Latest 1.32.x Version

Next, query the available patch versions for kubeadm.

Execution:

sudo apt update
sudo apt-cache madison kubeadm

Explanation: This lists available kubeadm versions (e.g., 1.32.0-1.1). You’ll use the latest patch (replace ‘x’ in later steps) to ensure you get the most recent fixes.

Potential Problems:

No Versions Listed: Repository not properly added or apt cache outdated.
Version Mismatch: Available versions don’t match your target (e.g., only 1.32.0 when 1.32.1 is expected).
Apt Cache Errors: Corrupted cache or disk space issues.

Solutions:

Rerun sudo apt update and check for errors. Verify the repo with cat /etc/apt/sources.list.d/kubernetes.list.
If mismatch, confirm Kubernetes release notes for supported patches. The Kubernetes project releases patches regularly; wait or use a specific version if needed.
Clear cache: sudo apt clean and sudo rm -rf /var/lib/apt/lists/*, then update again. Free up disk space if full.

Step 3: Upgrade Kubeadm First

Upgrade the kubeadm tool itself.

Execution:

sudo apt-mark unhold kubeadm && 
sudo apt-get update && sudo apt-get install -y kubeadm=1.32.x-* && 
sudo apt-mark hold kubeadm

Explanation: Unhold allows upgrade, install the new version, and hold prevents accidental changes. Kubeadm orchestrates the upgrade, so it must be updated first.

Potential Problems:

Dependency Conflicts: New kubeadm requires updated libraries that conflict with current ones.
Installation Fails: Due to held packages or unmet dependencies.
Downtime Risk: If run on a live cluster without planning.

Solutions:

Resolve conflicts by checking apt-get install output and installing missing deps manually (e.g., sudo apt install <package>).
If held packages block, use apt-mark showhold to list and unhold temporarily. Use --dry-run flag first to simulate.
Schedule during maintenance windows; kubeadm upgrade doesn’t affect running workloads yet.

Step 4: Verify the Upgrade Plan

Plan the upgrade without applying it.

Execution:

sudo kubeadm upgrade plan

Explanation: This checks cluster health, component versions, and proposes an upgrade path, highlighting any issues like deprecated APIs.

Potential Problems:

Incompatible Components: Add-ons (e.g., Ingress controllers) not ready for 1.32.
Health Check Failures: Etcd or nodes unhealthy.
API Deprecations: Warnings about removed features.

Solutions:

Update add-ons: Check docs for compatibility (e.g., upgrade Calico to a 1.32-compatible version).
Fix health: kubectl get cs for component status; restart failed pods or nodes.
Address deprecations: Refactor manifests to use new APIs (e.g., migrate from deprecated Ingress v1beta1 to v1).

Step 5: Apply the Upgrade (First Control Plane Only)

Execute the control plane upgrade.

Execution:

sudo kubeadm upgrade apply v1.32.x

Explanation: This upgrades API server, controller manager, scheduler, and etcd on the first control plane node.

Potential Problems:

Etcd Upgrade Fails: Data corruption or insufficient resources.
Downtime: Cluster unavailable during upgrade.
Certificate Issues: Expired or mismatched certs.

Solutions:

Backup etcd before: ETCDCTL_API=3 etcdctl snapshot save /path/to/backup.db. Restore if failed.
For HA clusters, upgrade one control plane at a time to minimize downtime.
Renew certs: kubeadm certs renew all. Check expiry with kubeadm certs check-expiration.

Markdown Diagram for Control Plane Upgrade Flow:

+-------------------+    +-------------------+    +-------------------+
|   Verify Plan     | -> |   Upgrade Apply   | -> |   Check Status    |
|   (kubeadm plan)  |    |   (First CP Only) |    |   (kubectl get cs)|
+-------------------+    +-------------------+    +-------------------+
         ^                                      |
         |                                      v
+-------------------+                   +-------------------+
|   Fix Issues      |                  |   Proceed to Next |
|   (e.g., Add-ons) |                  |   Control Planes  |
+-------------------+                   +-------------------+

Step 6: For Additional Control Planes

Upgrade secondary control planes.

Execution:

sudo kubeadm upgrade node

Explanation: This updates configs without recreating the control plane, ensuring HA.

Potential Problems:

Sync Issues: Nodes out of sync with primary.
Network Flaps: During upgrade, API server temporarily unreachable.

Solutions:

Ensure all nodes have the same kubeadm version. Rerun if sync fails.
Monitor with kubectl get nodes; retry after network stabilizes.

Step 7: Drain Node Before Kubelet Upgrade

Prepare the node by evicting workloads.

Execution:

kubectl drain <node-name> --ignore-daemonsets

Explanation: Marks node unschedulable and moves pods to other nodes.

Potential Problems:

Pod Eviction Failures: Pods with PDBs (Pod Disruption Budgets) block drain.
No Available Nodes: Single-node cluster or insufficient capacity.

Solutions:

Relax PDBs temporarily or scale up workloads. Use --force cautiously.
Add nodes or use taints to control scheduling. For single-node, skip but expect brief downtime.

Step 8: Upgrade Kubelet and Kubectl

Update node components.

Execution:

sudo apt-mark unhold kubelet kubectl && 
sudo apt-get update && sudo apt-get install -y kubelet=1.32.x-* kubectl=1.32.x-* && 
sudo apt-mark hold kubelet kubectl

Explanation: Upgrades the daemon that runs pods and the CLI tool.

Potential Problems:

Version Skew: Kubelet version >1 minor from API server.
Installation Errors: Similar to Step 3.

Solutions:

Kubernetes allows skew of 1 minor version; ensure compliance.
Troubleshoot as in Step 3; verify with kubelet --version.

Step 9: Restart Kubelet Service

Apply changes.

Execution:

sudo systemctl daemon-reload
sudo systemctl restart kubelet

Explanation: Reloads systemd and restarts kubelet with new binary.

Potential Problems:

Kubelet Fails to Start: Config errors or resource limits.
Logs Flooded: With errors post-restart.

Solutions:

Check logs: journalctl -u kubelet. Fix configs in /var/lib/kubelet/.
Increase resources or debug with kubelet --v=4 for verbose output.

Step 10: Uncordon Node

Re-enable scheduling.

Execution:

kubectl uncordon <node-name>

Explanation: Removes unschedulable taint.

Potential Problems:

Node Not Ready: After uncordon, status stuck in NotReady.
Pod Scheduling Delays: Due to network policies.

Solutions:

Wait and check kubectl describe node <name>. Restart if needed.
Verify CNI pods are healthy.

Step 11: Verify Cluster Status

Final check.

Execution:

kubectl get nodes

Explanation: Ensures all nodes are Ready and at 1.32.

Potential Problems:

Nodes Not Upgraded: Some still on 1.31.
Cluster Instability: Pods crashing post-upgrade.

Solutions:

Repeat steps for missed nodes.
Roll back if severe: Downgrade kubeadm and restore etcd snapshot.

Visualizing the Overall Upgrade Flow

Here’s a Markdown diagram summarizing the process:

Start: Backup Etcd & Cluster State
          |
          v
+-------------------+    +-------------------+
| Step 1-2: Repo &  | -> | Step 3-4: Upgrade |
| Version Check     |    | Kubeadm & Plan    |
+-------------------+    +-------------------+
          |
          v
+-------------------+    +-------------------+
| Step 5-6: Control | -> | Step 7-10: Node   |
| Plane Upgrade     |    | Drain, Upgrade,   |
+-------------------+    | Restart, Uncordon |
                         +-------------------+
          |
          v
+-------------------+
| Step 11: Verify   |
| Cluster           |
+-------------------+
          |
          v
End: Monitor & Test

Conclusion

Upgrading Kubernetes from 1.31 to 1.32 is a manageable process when broken into steps, but vigilance is key to handling issues like dependencies or health checks. By anticipating problems and having solutions ready, you minimize risks. Post-upgrade, monitor with tools like Prometheus, test workloads, and review release notes for 1.32-specific changes. For larger clusters, consider automation with tools like Cluster API. Remember, practice in a lab first—happy upgrading!

Cheers,

Sim