Mastering Kubernetes Logs

Mastering Kubernetes Logs

Published on
Authors

Kubernetes has revolutionized container orchestration, but with great power comes great complexity. When issues arise in your cluster, logs become your best friend for troubleshooting and understanding what’s happening under the hood. However, Kubernetes generates logs at multiple levels, each serving different purposes and containing different types of information.

Understanding where to look and what each log type tells you can mean the difference between quickly resolving an issue and spending hours debugging in the dark. This comprehensive guide will walk you through every type of Kubernetes log, where to find them, and how to interpret what they’re telling you.

The Kubernetes Logging Landscape

Kubernetes logging operates on multiple layers, from individual containers to the entire cluster infrastructure. Each layer provides unique insights into different aspects of your system’s behavior. Let’s explore each type systematically.

Container and Pod Level Logging

Container Logs: The Foundation of Troubleshooting

Location: /var/log/containers/*.log

Container logs are often your first stop when debugging application issues. These logs capture everything your application writes to stdout and stderr, making them essential for understanding application behavior, errors, and performance issues.

What you’ll find here:

  • Application error messages and stack traces
  • Runtime exceptions and crashes
  • Configuration problems
  • Performance bottlenecks
  • Custom application logging output

Common scenarios for checking container logs:

  • Your application is crashing repeatedly
  • Users report errors or unexpected behavior
  • Performance issues like slow response times
  • Configuration validation problems

Pro tip: Use kubectl logs <pod-name> -c <container-name> to access these logs directly, or kubectl logs <pod-name> --previous to see logs from a crashed container.

Pod Logs: Understanding Multi-Container Interactions

Location: /var/log/pods/*.log

Pod logs provide a broader view than individual container logs, especially valuable when dealing with multi-container pods. These logs help you understand how containers within a pod interact with each other and can reveal issues that aren’t apparent when looking at containers in isolation.

What you’ll find here:

  • Inter-container communication issues
  • Shared volume problems
  • Network connectivity issues between containers
  • Resource contention between containers in the same pod
  • Init container problems affecting main containers

When to check pod logs:

  • Sidecar containers aren’t working correctly
  • Service mesh issues (like Istio proxy problems)
  • Init containers are failing
  • Containers can’t communicate within the pod
  • Shared volume mounting issues

Kubelet: The Node Agent’s Perspective

Location: /var/log/kubelet.log

The kubelet is the primary node agent responsible for managing pods and containers on each node. Its logs provide crucial insights into pod lifecycle management, resource allocation, and communication with the control plane.

What you’ll find here:

  • Pod scheduling and startup issues
  • Resource allocation problems (CPU, memory, storage)
  • Image pulling failures
  • Volume mounting problems
  • Node health and capacity issues
  • Communication problems with the API server

Critical scenarios for kubelet logs:

  • Pods are stuck in “Pending” state
  • Containers fail to start
  • Image pull errors
  • Resource constraints causing pod evictions
  • Node becoming unresponsive or marked as “NotReady”

Example issues you might see:

Failed to pull image "nginx:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied

Control Plane Components: The Brain of Your Cluster

The control plane consists of several critical components, each with its own logging that provides insights into cluster-wide operations.

API Server Logs: The Central Nervous System

Location: /var/log/kube-apiserver.log

The API server is the central hub of your Kubernetes cluster. Every request to create, modify, or delete resources goes through the API server, making its logs invaluable for understanding cluster-wide operations and access patterns.

What you’ll find here:

  • Authentication and authorization failures
  • API request patterns and performance
  • Resource validation errors
  • Webhook failures
  • Client connection issues
  • Rate limiting events

Key debugging scenarios:

  • Users can’t access cluster resources
  • kubectl commands are failing
  • Custom resource definitions aren’t working
  • Admission controllers are rejecting requests
  • Performance issues with API operations

Controller Manager Logs: Resource Lifecycle Management

Location: /var/log/kube-controller-manager.log

The controller manager runs various controllers that manage the lifecycle of Kubernetes resources. These logs are essential for understanding why resources aren’t reaching their desired state.

What you’ll find here:

  • ReplicaSet scaling issues
  • Deployment rollout problems
  • Service endpoint updates
  • Node controller issues
  • Namespace deletion problems
  • Resource quota enforcement

Common troubleshooting scenarios:

  • Deployments are stuck and not progressing
  • ReplicaSets aren’t creating the right number of pods
  • Services aren’t updating their endpoints
  • Nodes aren’t being properly managed
  • Resource cleanup issues

Scheduler Logs: Pod Placement Decisions

Location: /var/log/kube-scheduler.log

The scheduler is responsible for placing pods on appropriate nodes based on resource requirements, constraints, and policies. Scheduler logs help you understand why pods aren’t being scheduled or are placed on unexpected nodes.

What you’ll find here:

  • Pod scheduling decisions and rationale
  • Resource constraint violations
  • Node affinity and anti-affinity rule evaluations
  • Taints and tolerations processing
  • Priority and preemption events

When to check scheduler logs:

  • Pods remain in “Pending” state
  • Pods are scheduled on inappropriate nodes
  • Resource requests aren’t being honored
  • Affinity rules aren’t working as expected
  • Priority classes aren’t being respected

etcd Logs: The Cluster’s Memory

Location: Varies based on etcd deployment method

etcd is Kubernetes’ distributed key-value store that maintains all cluster state. Its logs are crucial for understanding data consistency issues and cluster stability problems.

What you’ll find here:

  • Data consistency problems
  • Leader election issues
  • Network partition recovery
  • Backup and restore operations
  • Performance and latency issues

Critical scenarios:

  • Cluster state inconsistencies
  • etcd cluster member failures
  • Split-brain scenarios
  • Data corruption issues
  • Performance degradation

Node-Level Insights

System Logs (Syslog): The Operating System Perspective

Location: Local to each Linux distribution (typically /var/log/syslog or /var/log/messages)

System logs provide the operating system’s view of what’s happening on each node, including hardware issues, kernel problems, and system-level resource constraints.

What you’ll find here:

  • Hardware failures and warnings
  • Kernel panics and crashes
  • Memory pressure and OOM kills
  • Disk space and I/O issues
  • Network interface problems
  • Security events and violations

When to investigate system logs:

  • Nodes become unresponsive
  • Mysterious pod crashes
  • Performance degradation
  • Hardware alerts
  • Security incidents

Application-Level Logging

Application Logs: Your Code’s Voice

Location: /var/log/app.log (or wherever your application writes logs)

Application logs contain the business logic output of your containerized applications. These logs are customized by your development team and contain application-specific information.

What you’ll typically find:

  • Business logic errors
  • User interaction patterns
  • Performance metrics
  • Security events
  • Custom debugging information
  • Integration failures with external services

Custom Logs: Specialized Monitoring

Location: /var/log/custom-app.log

Custom logs are specialized logging outputs created for specific use cases, monitoring requirements, or compliance needs.

Examples include:

  • Audit logs for compliance
  • Security event logs
  • Performance monitoring data
  • Custom metrics and analytics
  • Integration logs with external systems

Best Practices for Kubernetes Logging

1. Implement Centralized Logging

Don’t rely on accessing logs directly on nodes. Implement a centralized logging solution using tools like:

  • ELK Stack (Elasticsearch, Logstash, Kibana)
  • EFK Stack (Elasticsearch, Fluentd, Kibana)
  • Grafana Loki with Promtail
  • Cloud-native solutions like AWS CloudWatch, Google Cloud Logging, or Azure Monitor

2. Structure Your Application Logs

Ensure your applications output structured logs (JSON format) with consistent fields:

{
  "timestamp": "2024-05-31T10:30:00Z",
  "level": "ERROR",
  "message": "Failed to connect to database",
  "service": "user-service",
  "traceId": "abc123",
  "error": "connection timeout"
}

3. Use Log Levels Appropriately

Implement proper log levels in your applications:

  • DEBUG: Detailed information for development
  • INFO: General operational information
  • WARN: Potentially harmful situations
  • ERROR: Error events that don’t stop the application
  • FATAL: Severe errors that cause application termination

4. Implement Log Rotation and Retention

Configure appropriate log rotation and retention policies to prevent disk space issues:

  • Set maximum log file sizes
  • Implement automated cleanup of old logs
  • Consider compliance requirements for log retention

5. Monitor Log Volume and Performance

Keep track of logging overhead:

  • Monitor log volume to prevent storage issues
  • Ensure logging doesn’t impact application performance
  • Use sampling for high-volume debug logs

Troubleshooting Workflows

Application Issues

  1. Start with container logs for the specific pod
  2. Check pod logs if multiple containers are involved
  3. Review kubelet logs if pod lifecycle issues are suspected
  4. Examine system logs for node-level problems

Cluster-Wide Issues

  1. Begin with API server logs for authentication/authorization problems
  2. Check controller manager logs for resource management issues
  3. Review scheduler logs for pod placement problems
  4. Investigate etcd logs for data consistency issues

Performance Problems

  1. Application logs for business logic performance
  2. Kubelet logs for resource allocation issues
  3. System logs for hardware and OS-level constraints
  4. API server logs for request processing bottlenecks

Conclusion

Understanding Kubernetes logs is essential for maintaining healthy, performant clusters. Each log type provides unique insights into different aspects of your system, from individual application behavior to cluster-wide operations. By knowing where to look and what to look for, you can dramatically reduce troubleshooting time and improve your cluster’s reliability.

Remember that effective logging is not just about collecting data—it’s about having the right information available when you need it. Implement centralized logging, use structured formats, and establish clear troubleshooting workflows to make the most of your Kubernetes logging strategy.

The investment in understanding and properly implementing Kubernetes logging will pay dividends in reduced downtime, faster issue resolution, and better overall system visibility. Start with the basics, gradually implement more sophisticated logging practices, and always keep learning as your Kubernetes expertise grows.

Cheers,

Sim