
Upgrading Kubernetes
- Published on
- Authors
- Author
- Ram Simran G
- twitter @rgarimella0124
Kubernetes, the leading container orchestration platform, evolves rapidly with new features, security patches, and performance improvements in each release. Upgrading your cluster is essential to stay current, but it can be daunting due to the potential for downtime, compatibility issues, or misconfigurations. In this detailed blog post, we’ll walk through upgrading a Kubernetes cluster from version 1.31 to 1.32 using a structured, step-by-step script. This minor version upgrade introduces enhancements like improved API stability and bug fixes, but requires careful planning to avoid disruptions.
We’ll explain each step in depth, including why it’s necessary, how to execute it, and common problems that might arise. For each issue, I’ll suggest possible solutions based on real-world experiences from Kubernetes administrators. By the end, you’ll have a solid understanding of the process, complete with code snippets and a Markdown-based flow diagram to visualize the workflow. This guide assumes you’re using a Debian-based system (like Ubuntu) with apt for package management, as it’s common in Kubernetes setups.
Why Upgrade Kubernetes? Understanding the Context
Before diving in, let’s discuss why upgrades matter. Kubernetes follows a semantic versioning scheme: major.minor.patch. A minor upgrade like 1.31 to 1.32 typically involves API deprecations, new features (e.g., better support for dynamic resource allocation in 1.32), and fixes for vulnerabilities. Skipping upgrades can lead to security risks or missing out on optimizations.
However, upgrades aren’t without risks. Control plane components (like etcd, API server) and node-level tools (kubelet, kubectl) must be updated in sequence to maintain cluster stability. Always back up your etcd data and test in a staging environment first. Prerequisites include:
- A healthy cluster (check with
kubectl get nodesandkubectl get pods --all-namespaces). - Sufficient resources (CPU, memory) on nodes.
- Backups of persistent volumes and etcd snapshots.
- Access to sudo and kubectl as cluster admin.
- Familiarity with your cluster’s add-ons (e.g., CNI like Calico, which may need compatibility checks).
Now, let’s break down the upgrade process.
The Upgrade Process: Step by Step
This process focuses on a multi-node cluster with at least one control plane. Start with the first control plane node, then additional control planes, and finally worker nodes. Repeat steps for each node where applicable.
Step 1: Configure Package Repository for the New Kubernetes Version
The first step sets up the apt repository for Kubernetes 1.32 packages. This ensures your system can fetch the updated binaries.
Execution:
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update Explanation: This downloads the signing key for the 1.32 repository, adds the repo source, and updates the package list. It’s required for minor upgrades because each version has its own stable repository to isolate dependencies.
Potential Problems:
- Network Issues: The curl command fails if there’s no internet access or firewall blocks.
- GPG Key Errors: If gpg isn’t installed or there’s a key conflict.
- Repository Conflicts: Existing sources.list might have overlapping entries, causing apt errors.
- Permission Denied: Running without sudo.
Solutions:
- For network issues, verify connectivity with
ping pkgs.k8s.ioor use a proxy if behind a corporate firewall. Retry after fixing. - Install gpg if missing:
sudo apt install gnupg. If key conflicts, remove old keys:sudo rm /etc/apt/keyrings/kubernetes-apt-keyring.gpgand rerun. - Check for conflicts:
grep kubernetes /etc/apt/sources.list.d/*and remove duplicates. Useapt policyto verify sources. - Always use sudo; if permission issues persist, check your user’s sudoers file.
Step 2: Find the Available Latest 1.32.x Version
Next, query the available patch versions for kubeadm.
Execution:
sudo apt update
sudo apt-cache madison kubeadm Explanation: This lists available kubeadm versions (e.g., 1.32.0-1.1). You’ll use the latest patch (replace ‘x’ in later steps) to ensure you get the most recent fixes.
Potential Problems:
- No Versions Listed: Repository not properly added or apt cache outdated.
- Version Mismatch: Available versions don’t match your target (e.g., only 1.32.0 when 1.32.1 is expected).
- Apt Cache Errors: Corrupted cache or disk space issues.
Solutions:
- Rerun
sudo apt updateand check for errors. Verify the repo withcat /etc/apt/sources.list.d/kubernetes.list. - If mismatch, confirm Kubernetes release notes for supported patches. The Kubernetes project releases patches regularly; wait or use a specific version if needed.
- Clear cache:
sudo apt cleanandsudo rm -rf /var/lib/apt/lists/*, then update again. Free up disk space if full.
Step 3: Upgrade Kubeadm First
Upgrade the kubeadm tool itself.
Execution:
sudo apt-mark unhold kubeadm &&
sudo apt-get update && sudo apt-get install -y kubeadm=1.32.x-* &&
sudo apt-mark hold kubeadm Explanation: Unhold allows upgrade, install the new version, and hold prevents accidental changes. Kubeadm orchestrates the upgrade, so it must be updated first.
Potential Problems:
- Dependency Conflicts: New kubeadm requires updated libraries that conflict with current ones.
- Installation Fails: Due to held packages or unmet dependencies.
- Downtime Risk: If run on a live cluster without planning.
Solutions:
- Resolve conflicts by checking
apt-get installoutput and installing missing deps manually (e.g.,sudo apt install <package>). - If held packages block, use
apt-mark showholdto list and unhold temporarily. Use--dry-runflag first to simulate. - Schedule during maintenance windows; kubeadm upgrade doesn’t affect running workloads yet.
Step 4: Verify the Upgrade Plan
Plan the upgrade without applying it.
Execution:
sudo kubeadm upgrade plan Explanation: This checks cluster health, component versions, and proposes an upgrade path, highlighting any issues like deprecated APIs.
Potential Problems:
- Incompatible Components: Add-ons (e.g., Ingress controllers) not ready for 1.32.
- Health Check Failures: Etcd or nodes unhealthy.
- API Deprecations: Warnings about removed features.
Solutions:
- Update add-ons: Check docs for compatibility (e.g., upgrade Calico to a 1.32-compatible version).
- Fix health:
kubectl get csfor component status; restart failed pods or nodes. - Address deprecations: Refactor manifests to use new APIs (e.g., migrate from deprecated Ingress v1beta1 to v1).
Step 5: Apply the Upgrade (First Control Plane Only)
Execute the control plane upgrade.
Execution:
sudo kubeadm upgrade apply v1.32.x Explanation: This upgrades API server, controller manager, scheduler, and etcd on the first control plane node.
Potential Problems:
- Etcd Upgrade Fails: Data corruption or insufficient resources.
- Downtime: Cluster unavailable during upgrade.
- Certificate Issues: Expired or mismatched certs.
Solutions:
- Backup etcd before:
ETCDCTL_API=3 etcdctl snapshot save /path/to/backup.db. Restore if failed. - For HA clusters, upgrade one control plane at a time to minimize downtime.
- Renew certs:
kubeadm certs renew all. Check expiry withkubeadm certs check-expiration.
Markdown Diagram for Control Plane Upgrade Flow:
+-------------------+ +-------------------+ +-------------------+
| Verify Plan | -> | Upgrade Apply | -> | Check Status |
| (kubeadm plan) | | (First CP Only) | | (kubectl get cs)|
+-------------------+ +-------------------+ +-------------------+
^ |
| v
+-------------------+ +-------------------+
| Fix Issues | | Proceed to Next |
| (e.g., Add-ons) | | Control Planes |
+-------------------+ +-------------------+ Step 6: For Additional Control Planes
Upgrade secondary control planes.
Execution:
sudo kubeadm upgrade node Explanation: This updates configs without recreating the control plane, ensuring HA.
Potential Problems:
- Sync Issues: Nodes out of sync with primary.
- Network Flaps: During upgrade, API server temporarily unreachable.
Solutions:
- Ensure all nodes have the same kubeadm version. Rerun if sync fails.
- Monitor with
kubectl get nodes; retry after network stabilizes.
Step 7: Drain Node Before Kubelet Upgrade
Prepare the node by evicting workloads.
Execution:
kubectl drain <node-name> --ignore-daemonsets Explanation: Marks node unschedulable and moves pods to other nodes.
Potential Problems:
- Pod Eviction Failures: Pods with PDBs (Pod Disruption Budgets) block drain.
- No Available Nodes: Single-node cluster or insufficient capacity.
Solutions:
- Relax PDBs temporarily or scale up workloads. Use
--forcecautiously. - Add nodes or use taints to control scheduling. For single-node, skip but expect brief downtime.
Step 8: Upgrade Kubelet and Kubectl
Update node components.
Execution:
sudo apt-mark unhold kubelet kubectl &&
sudo apt-get update && sudo apt-get install -y kubelet=1.32.x-* kubectl=1.32.x-* &&
sudo apt-mark hold kubelet kubectl Explanation: Upgrades the daemon that runs pods and the CLI tool.
Potential Problems:
- Version Skew: Kubelet version >1 minor from API server.
- Installation Errors: Similar to Step 3.
Solutions:
- Kubernetes allows skew of 1 minor version; ensure compliance.
- Troubleshoot as in Step 3; verify with
kubelet --version.
Step 9: Restart Kubelet Service
Apply changes.
Execution:
sudo systemctl daemon-reload
sudo systemctl restart kubelet Explanation: Reloads systemd and restarts kubelet with new binary.
Potential Problems:
- Kubelet Fails to Start: Config errors or resource limits.
- Logs Flooded: With errors post-restart.
Solutions:
- Check logs:
journalctl -u kubelet. Fix configs in /var/lib/kubelet/. - Increase resources or debug with
kubelet --v=4for verbose output.
Step 10: Uncordon Node
Re-enable scheduling.
Execution:
kubectl uncordon <node-name> Explanation: Removes unschedulable taint.
Potential Problems:
- Node Not Ready: After uncordon, status stuck in NotReady.
- Pod Scheduling Delays: Due to network policies.
Solutions:
- Wait and check
kubectl describe node <name>. Restart if needed. - Verify CNI pods are healthy.
Step 11: Verify Cluster Status
Final check.
Execution:
kubectl get nodes Explanation: Ensures all nodes are Ready and at 1.32.
Potential Problems:
- Nodes Not Upgraded: Some still on 1.31.
- Cluster Instability: Pods crashing post-upgrade.
Solutions:
- Repeat steps for missed nodes.
- Roll back if severe: Downgrade kubeadm and restore etcd snapshot.
Visualizing the Overall Upgrade Flow
Here’s a Markdown diagram summarizing the process:
Start: Backup Etcd & Cluster State
|
v
+-------------------+ +-------------------+
| Step 1-2: Repo & | -> | Step 3-4: Upgrade |
| Version Check | | Kubeadm & Plan |
+-------------------+ +-------------------+
|
v
+-------------------+ +-------------------+
| Step 5-6: Control | -> | Step 7-10: Node |
| Plane Upgrade | | Drain, Upgrade, |
+-------------------+ | Restart, Uncordon |
+-------------------+
|
v
+-------------------+
| Step 11: Verify |
| Cluster |
+-------------------+
|
v
End: Monitor & Test Conclusion
Upgrading Kubernetes from 1.31 to 1.32 is a manageable process when broken into steps, but vigilance is key to handling issues like dependencies or health checks. By anticipating problems and having solutions ready, you minimize risks. Post-upgrade, monitor with tools like Prometheus, test workloads, and review release notes for 1.32-specific changes. For larger clusters, consider automation with tools like Cluster API. Remember, practice in a lab first—happy upgrading!
Cheers,
Sim