Automating SSL Certificate Renewal
- Published on
- Authors
- Author
- Ram Simran G
- twitter @rgarimella0124
SSL/TLS certificates are the backbone of secure web communications, ensuring encrypted data transfer and user trust. However, managing them manually—especially in dynamic environments like Kubernetes (K8s) clusters running on AWS EC2—can be a nightmare. Renewals every 90 days, generating CSRs, dealing with certificate authorities (CAs), and updating services across load balancers? It’s error-prone, time-consuming, and risky.
In this post, we’ll walk through automating this process using Cert-Manager on Kubernetes, integrated with Let’s Encrypt for free, automated certificates. This setup works seamlessly whether you’re running a self-managed K8s cluster on EC2 instances or using Amazon EKS (which provisions control plane on managed infrastructure but nodes on EC2). The result: zero-touch renewals starting at 60 days before expiry, scalable to dozens of certificates, and massive time savings.
Why Automate? The Hidden Costs of Manual Renewal
Before diving in, let’s quantify the pain. A typical manual renewal cycle might look like this:
| Step | Time Estimate | Pain Points |
|---|---|---|
| Generate CSR | 15 min | Key management errors |
| Submit to CA | 30 min | Approval delays |
| Wait for approval | Variable | Vendor dependencies |
| Update load balancers | 45 min | Downtime risk |
| Test in production | 30 min | Rollback complexity |
| Total per cert | 2 hours | Human error, reminders |
For 47 certificates renewed quarterly, that’s ~376 hours annually—equivalent to nearly 10 full work weeks for one engineer. Scale that across teams, and it’s a productivity black hole.
How Severe Is This If Not Properly Looked Into?
Ignoring automation isn’t just inefficient; it’s a ticking time bomb. Here’s why it’s severe:
Security Vulnerabilities: Expired certificates trigger browser warnings (e.g., “Your connection is not private”), eroding user trust. Worse, attackers can exploit unencrypted traffic via man-in-the-middle attacks, leading to data breaches. In regulated industries (e.g., finance, healthcare), this violates compliance like PCI-DSS or HIPAA, inviting fines up to $50,000 per violation.
Operational Downtime: Services behind invalid certs fail outright—APIs reject requests, websites go dark. A single expired cert on a load-balanced EC2 fleet can cascade to cluster-wide outages. In cloud-native setups, this hits harder: Pods restart, ingresses misroute, and autoscaling panics.
Scalability Nightmares: As your K8s cluster grows (e.g., more EC2 node groups), manual processes don’t scale. One forgotten renewal? Zero uptime for that service. Historical data shows 20-30% of websites run expired certs at any time, correlating with 15-20% revenue dips from lost traffic.
Opportunity Cost: Engineers waste hours on rote tasks instead of innovation. Annually, that’s 30+ hours per team member—time better spent on features or optimizations.
Bottom line: In a world of zero-downtime expectations, expired certs aren’t “embarrassing”; they’re catastrophic. Automation isn’t optional—it’s table stakes for reliability.
Prerequisites: Setting Up Your EC2-Based K8s Environment
Assume you’re running a self-managed K8s cluster on EC2 (e.g., via kubeadm) or EKS. Key requirements:
- K8s Version: 1.21+ (Cert-Manager supports up to latest).
- EC2 Setup: t3.medium+ instances with IAM roles for outbound HTTPS (to Let’s Encrypt ACME servers). Security groups allowing port 80/443 for HTTP-01 challenges.
- Ingress Controller: NGINX Ingress or ALB Ingress for exposing services (Cert-Manager integrates here for auto-TLS).
- Helm: For easy Cert-Manager install (v3+).
- Domain Control: A registered domain with DNS pointing to your EC2 load balancer (e.g., ELB/ALB).
Install kubectl and Helm on your local machine or bastion EC2 instance:
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash Point kubectl to your cluster:
kubectl config use-context your-ec2-k8s-context Step-by-Step: Automating with Cert-Manager and Let’s Encrypt
Cert-Manager is a K8s-native controller that automates certificate lifecycle using ACME protocol (Let’s Encrypt’s standard). It provisions, renews, and stores certs as K8s secrets.
Step 1: Install Cert-Manager
Add the Jetstack Helm repo and install:
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager
--namespace cert-manager
--create-namespace
--version v1.15.3
--set installCRDs=true Verify:
kubectl get pods --namespace cert-manager Expect cert-manager-* pods running.
Step 2: Configure a ClusterIssuer for Let’s Encrypt
A ClusterIssuer defines how to request certs from Let’s Encrypt. Use HTTP-01 challenge (solves via temporary web server on port 80—ideal for EC2/ALB setups).
Create letsencrypt-prod.yaml:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory # Production
email: your-admin@yourdomain.com # For expiry notifications
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx # Or 'alb' for AWS ALB Ingress Apply:
kubectl apply -f letsencrypt-prod.yaml For staging (test first, avoids rate limits):
Replace server with https://acme-staging-v02.api.letsencrypt.org/directory.
Step 3: Request Your First Certificate
Define a Certificate resource. This tells Cert-Manager to issue/renew a cert for your domain.
Create example-cert.yaml:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: example-tls
namespace: default # Your app namespace
spec:
secretName: example-tls-secret # K8s secret to store cert
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
commonName: yourdomain.com
dnsNames:
- yourdomain.com
- www.yourdomain.com
duration: 2160h # 90 days
renewBefore: 360h # Renew at 60 days (360h = 15 days before expiry) Apply:
kubectl apply -f example-cert.yaml Cert-Manager will:
- Generate a private key.
- Submit CSR to Let’s Encrypt via ACME.
- Solve challenge by creating a temporary ingress/pod.
- Store PEM cert/key in the named secret.
Check status:
kubectl describe certificate example-tls -n default
kubectl get secret example-tls-secret -n default -o yaml Step 4: Integrate with Ingress for Auto-TLS
Update your Ingress to use the secret:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example-ingress
namespace: default
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod # Auto-annotates for new certs
spec:
ingressClassName: nginx # Or alb
tls:
- hosts:
- yourdomain.com
secretName: example-tls-secret
rules:
- host: yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: your-service
port:
number: 80 Apply and verify: curl -k https://yourdomain.com should show a valid cert.
For EC2-specific tweaks:
- If using ALB Ingress, ensure your EC2 nodes have public IPs or NLB frontend.
- Scale to multiple certs: Repeat Step 3 per domain/namespace. Cert-Manager handles 100+ effortlessly.
Step 5: Handle EC2 Load Balancer Updates
In a self-managed K8s on EC2:
- Use AWS Load Balancer Controller (install via Helm:
helm install aws-load-balancer-controller eks/aws-load-balancer-controller). - Annotate ingresses with
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:...if mixing with AWS ACM, but stick to K8s secrets for pure automation. - For renewals: Cert-Manager updates the secret; the ingress controller reloads TLS automatically—no manual LB config.
In EKS: Nodes run on EC2, but control plane is managed. The AWS LB Controller provisions ALBs that auto-sync with cert secrets.
Testing and Monitoring
- Test Renewal: Force expiry simulation with
kubectl annotate certificate example-tls cert-manager.io/issue-temporary-certificate=true. Monitor logs:kubectl logs -n cert-manager -l app=cert-manager. - Production Dry Run: Use staging issuer first.
- Observability: Integrate Prometheus (scrape Cert-Manager metrics) for alerts on failed renewals. Watch for
CertificateReadyconditions. - Edge Cases: Wildcard certs (
*.yourdomain.com—use DNS-01 solver with Route53). Multi-cluster? Use Gateway API.
Real-World Impact: From Chaos to Calm
Post-automation:
- Renewals: Trigger at 60 days, complete in minutes.
- Scale: Manages 47+ certs across namespaces.
- Savings: ~32 engineer hours/year (2 hrs x 4 renewals x 47 certs / 2 for efficiency gains).
- Reliability: Zero expiries, 99.99% uptime.
This setup transformed our EC2 K8s fleet from a renewal roulette to a set-it-and-forget-it powerhouse. Start small—one cert—and scale. Questions? Drop a comment below.
Resources: Cert-Manager Docs, Let’s Encrypt.
Cheers,
Sim