
Mastering Kubernetes Logs
- Published on
- Authors
- Author
- Ram Simran G
- twitter @rgarimella0124
Kubernetes has revolutionized container orchestration, but with great power comes great complexity. When issues arise in your cluster, logs become your best friend for troubleshooting and understanding what’s happening under the hood. However, Kubernetes generates logs at multiple levels, each serving different purposes and containing different types of information.
Understanding where to look and what each log type tells you can mean the difference between quickly resolving an issue and spending hours debugging in the dark. This comprehensive guide will walk you through every type of Kubernetes log, where to find them, and how to interpret what they’re telling you.
The Kubernetes Logging Landscape
Kubernetes logging operates on multiple layers, from individual containers to the entire cluster infrastructure. Each layer provides unique insights into different aspects of your system’s behavior. Let’s explore each type systematically.
Container and Pod Level Logging
Container Logs: The Foundation of Troubleshooting
Location: /var/log/containers/*.log
Container logs are often your first stop when debugging application issues. These logs capture everything your application writes to stdout and stderr, making them essential for understanding application behavior, errors, and performance issues.
What you’ll find here:
- Application error messages and stack traces
- Runtime exceptions and crashes
- Configuration problems
- Performance bottlenecks
- Custom application logging output
Common scenarios for checking container logs:
- Your application is crashing repeatedly
- Users report errors or unexpected behavior
- Performance issues like slow response times
- Configuration validation problems
Pro tip: Use kubectl logs <pod-name> -c <container-name>
to access these logs directly, or kubectl logs <pod-name> --previous
to see logs from a crashed container.
Pod Logs: Understanding Multi-Container Interactions
Location: /var/log/pods/*.log
Pod logs provide a broader view than individual container logs, especially valuable when dealing with multi-container pods. These logs help you understand how containers within a pod interact with each other and can reveal issues that aren’t apparent when looking at containers in isolation.
What you’ll find here:
- Inter-container communication issues
- Shared volume problems
- Network connectivity issues between containers
- Resource contention between containers in the same pod
- Init container problems affecting main containers
When to check pod logs:
- Sidecar containers aren’t working correctly
- Service mesh issues (like Istio proxy problems)
- Init containers are failing
- Containers can’t communicate within the pod
- Shared volume mounting issues
Kubelet: The Node Agent’s Perspective
Location: /var/log/kubelet.log
The kubelet is the primary node agent responsible for managing pods and containers on each node. Its logs provide crucial insights into pod lifecycle management, resource allocation, and communication with the control plane.
What you’ll find here:
- Pod scheduling and startup issues
- Resource allocation problems (CPU, memory, storage)
- Image pulling failures
- Volume mounting problems
- Node health and capacity issues
- Communication problems with the API server
Critical scenarios for kubelet logs:
- Pods are stuck in “Pending” state
- Containers fail to start
- Image pull errors
- Resource constraints causing pod evictions
- Node becoming unresponsive or marked as “NotReady”
Example issues you might see:
Failed to pull image "nginx:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied
Control Plane Components: The Brain of Your Cluster
The control plane consists of several critical components, each with its own logging that provides insights into cluster-wide operations.
API Server Logs: The Central Nervous System
Location: /var/log/kube-apiserver.log
The API server is the central hub of your Kubernetes cluster. Every request to create, modify, or delete resources goes through the API server, making its logs invaluable for understanding cluster-wide operations and access patterns.
What you’ll find here:
- Authentication and authorization failures
- API request patterns and performance
- Resource validation errors
- Webhook failures
- Client connection issues
- Rate limiting events
Key debugging scenarios:
- Users can’t access cluster resources
- kubectl commands are failing
- Custom resource definitions aren’t working
- Admission controllers are rejecting requests
- Performance issues with API operations
Controller Manager Logs: Resource Lifecycle Management
Location: /var/log/kube-controller-manager.log
The controller manager runs various controllers that manage the lifecycle of Kubernetes resources. These logs are essential for understanding why resources aren’t reaching their desired state.
What you’ll find here:
- ReplicaSet scaling issues
- Deployment rollout problems
- Service endpoint updates
- Node controller issues
- Namespace deletion problems
- Resource quota enforcement
Common troubleshooting scenarios:
- Deployments are stuck and not progressing
- ReplicaSets aren’t creating the right number of pods
- Services aren’t updating their endpoints
- Nodes aren’t being properly managed
- Resource cleanup issues
Scheduler Logs: Pod Placement Decisions
Location: /var/log/kube-scheduler.log
The scheduler is responsible for placing pods on appropriate nodes based on resource requirements, constraints, and policies. Scheduler logs help you understand why pods aren’t being scheduled or are placed on unexpected nodes.
What you’ll find here:
- Pod scheduling decisions and rationale
- Resource constraint violations
- Node affinity and anti-affinity rule evaluations
- Taints and tolerations processing
- Priority and preemption events
When to check scheduler logs:
- Pods remain in “Pending” state
- Pods are scheduled on inappropriate nodes
- Resource requests aren’t being honored
- Affinity rules aren’t working as expected
- Priority classes aren’t being respected
etcd Logs: The Cluster’s Memory
Location: Varies based on etcd deployment method
etcd is Kubernetes’ distributed key-value store that maintains all cluster state. Its logs are crucial for understanding data consistency issues and cluster stability problems.
What you’ll find here:
- Data consistency problems
- Leader election issues
- Network partition recovery
- Backup and restore operations
- Performance and latency issues
Critical scenarios:
- Cluster state inconsistencies
- etcd cluster member failures
- Split-brain scenarios
- Data corruption issues
- Performance degradation
Node-Level Insights
System Logs (Syslog): The Operating System Perspective
Location: Local to each Linux distribution (typically /var/log/syslog
or /var/log/messages
)
System logs provide the operating system’s view of what’s happening on each node, including hardware issues, kernel problems, and system-level resource constraints.
What you’ll find here:
- Hardware failures and warnings
- Kernel panics and crashes
- Memory pressure and OOM kills
- Disk space and I/O issues
- Network interface problems
- Security events and violations
When to investigate system logs:
- Nodes become unresponsive
- Mysterious pod crashes
- Performance degradation
- Hardware alerts
- Security incidents
Application-Level Logging
Application Logs: Your Code’s Voice
Location: /var/log/app.log
(or wherever your application writes logs)
Application logs contain the business logic output of your containerized applications. These logs are customized by your development team and contain application-specific information.
What you’ll typically find:
- Business logic errors
- User interaction patterns
- Performance metrics
- Security events
- Custom debugging information
- Integration failures with external services
Custom Logs: Specialized Monitoring
Location: /var/log/custom-app.log
Custom logs are specialized logging outputs created for specific use cases, monitoring requirements, or compliance needs.
Examples include:
- Audit logs for compliance
- Security event logs
- Performance monitoring data
- Custom metrics and analytics
- Integration logs with external systems
Best Practices for Kubernetes Logging
1. Implement Centralized Logging
Don’t rely on accessing logs directly on nodes. Implement a centralized logging solution using tools like:
- ELK Stack (Elasticsearch, Logstash, Kibana)
- EFK Stack (Elasticsearch, Fluentd, Kibana)
- Grafana Loki with Promtail
- Cloud-native solutions like AWS CloudWatch, Google Cloud Logging, or Azure Monitor
2. Structure Your Application Logs
Ensure your applications output structured logs (JSON format) with consistent fields:
{
"timestamp": "2024-05-31T10:30:00Z",
"level": "ERROR",
"message": "Failed to connect to database",
"service": "user-service",
"traceId": "abc123",
"error": "connection timeout"
}
3. Use Log Levels Appropriately
Implement proper log levels in your applications:
- DEBUG: Detailed information for development
- INFO: General operational information
- WARN: Potentially harmful situations
- ERROR: Error events that don’t stop the application
- FATAL: Severe errors that cause application termination
4. Implement Log Rotation and Retention
Configure appropriate log rotation and retention policies to prevent disk space issues:
- Set maximum log file sizes
- Implement automated cleanup of old logs
- Consider compliance requirements for log retention
5. Monitor Log Volume and Performance
Keep track of logging overhead:
- Monitor log volume to prevent storage issues
- Ensure logging doesn’t impact application performance
- Use sampling for high-volume debug logs
Troubleshooting Workflows
Application Issues
- Start with container logs for the specific pod
- Check pod logs if multiple containers are involved
- Review kubelet logs if pod lifecycle issues are suspected
- Examine system logs for node-level problems
Cluster-Wide Issues
- Begin with API server logs for authentication/authorization problems
- Check controller manager logs for resource management issues
- Review scheduler logs for pod placement problems
- Investigate etcd logs for data consistency issues
Performance Problems
- Application logs for business logic performance
- Kubelet logs for resource allocation issues
- System logs for hardware and OS-level constraints
- API server logs for request processing bottlenecks
Conclusion
Understanding Kubernetes logs is essential for maintaining healthy, performant clusters. Each log type provides unique insights into different aspects of your system, from individual application behavior to cluster-wide operations. By knowing where to look and what to look for, you can dramatically reduce troubleshooting time and improve your cluster’s reliability.
Remember that effective logging is not just about collecting data—it’s about having the right information available when you need it. Implement centralized logging, use structured formats, and establish clear troubleshooting workflows to make the most of your Kubernetes logging strategy.
The investment in understanding and properly implementing Kubernetes logging will pay dividends in reduced downtime, faster issue resolution, and better overall system visibility. Start with the basics, gradually implement more sophisticated logging practices, and always keep learning as your Kubernetes expertise grows.
Cheers,
Sim