Logging Architecture
Application logs can help you understand what is happening inside your application. The logs are particularly useful for debugging problems and monitoring cluster activity. Most modern applications have some kind of logging mechanism. Likewise, container engines are designed to support logging. The easiest and most adopted logging method for containerized applications is writing to standard output and standard error streams.
However, the native functionality provided by a container engine or runtime is usually not enough for a complete logging solution. For example, you may want access your application's logs if a container crashes; a pod gets evicted; or a node dies. In a cluster, logs should have a separate storage and lifecycle independent of nodes, pods, or containers. This concept is called cluster-level logging.
Cluster-level logging architectures require a separate backend to store, analyze, and query logs. Kubernetes does not provide a native storage solution for log data. Instead, there are many logging solutions that integrate with Kubernetes. The following sections describe how to handle and store logs on nodes.
Basic logging in Kubernetes
This example uses a Pod
specification with a container
to write text to the standard output stream once per second.
apiVersion: v1
kind: Pod
metadata:
name: counter
spec:
containers:
- name: count
image: busybox
args: [/bin/sh, -c,
'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
To run this pod, use the following command:
kubectl apply -f https://k8s.io/examples/debug/counter-pod.yaml
The output is:
pod/counter created
To fetch the logs, use the kubectl logs
command, as follows:
kubectl logs counter
The output is:
0: Mon Jan 1 00:00:00 UTC 2001
1: Mon Jan 1 00:00:01 UTC 2001
2: Mon Jan 1 00:00:02 UTC 2001
...
You can use kubectl logs --previous
to retrieve logs from a previous instantiation of a container. If your pod has multiple containers, specify which container's logs you want to access by appending a container name to the command. See the kubectl logs
documentation for more details.
Logging at the node level
A container engine handles and redirects any output generated to a containerized application's stdout
and stderr
streams.
For example, the Docker container engine redirects those two streams to a logging driver, which is configured in Kubernetes to write to a file in JSON format.
By default, if a container restarts, the kubelet keeps one terminated container with its logs. If a pod is evicted from the node, all corresponding containers are also evicted, along with their logs.
An important consideration in node-level logging is implementing log rotation,
so that logs don't consume all available storage on the node. Kubernetes
is not responsible for rotating logs, but rather a deployment tool
should set up a solution to address that.
For example, in Kubernetes clusters, deployed by the kube-up.sh
script,
there is a logrotate
tool configured to run each hour. You can also set up a container runtime to
rotate an application's logs automatically.
As an example, you can find detailed information about how kube-up.sh
sets
up logging for COS image on GCP in the corresponding
configure-helper
script.
When using a CRI container runtime, the kubelet is responsible for rotating the logs and managing the logging directory structure.
The kubelet sends this information to the CRI container runtime and the runtime writes the container logs to the given location.
The two kubelet parameters containerLogMaxSize
and containerLogMaxFiles
in kubelet config file
can be used to configure the maximum size for each log file and the maximum number of files allowed for each container respectively.
When you run kubectl logs
as in
the basic logging example, the kubelet on the node handles the request and
reads directly from the log file. The kubelet returns the content of the log file.
kubectl logs
. For example, if there's a 10MB file, logrotate
performs
the rotation and there are two files: one file that is 10MB in size and a second file that is empty.
kubectl logs
returns the latest log file which in this example is an empty response.
System component logs
There are two types of system components: those that run in a container and those that do not run in a container. For example:
- The Kubernetes scheduler and kube-proxy run in a container.
- The kubelet and container runtime do not run in containers.
On machines with systemd, the kubelet and container runtime write to journald. If
systemd is not present, the kubelet and container runtime write to .log
files
in the /var/log
directory. System components inside containers always write
to the /var/log
directory, bypassing the default logging mechanism.
They use the klog
logging library. You can find the conventions for logging severity for those
components in the development docs on logging.
Similar to the container logs, system component logs in the /var/log
directory should be rotated. In Kubernetes clusters brought up by
the kube-up.sh
script, those logs are configured to be rotated by
the logrotate
tool daily or once the size exceeds 100MB.
Cluster-level logging architectures
While Kubernetes does not provide a native solution for cluster-level logging, there are several common approaches you can consider. Here are some options:
- Use a node-level logging agent that runs on every node.
- Include a dedicated sidecar container for logging in an application pod.
- Push logs directly to a backend from within an application.
Using a node logging agent
You can implement cluster-level logging by including a node-level logging agent on each node. The logging agent is a dedicated tool that exposes logs or pushes logs to a backend. Commonly, the logging agent is a container that has access to a directory with log files from all of the application containers on that node.
Because the logging agent must run on every node, it is recommended to run the agent
as a DaemonSet
.
Node-level logging creates only one agent per node and doesn't require any changes to the applications running on the node.
Containers write stdout and stderr, but with no agreed format. A node-level agent collects these logs and forwards them for aggregation.
Using a sidecar container with the logging agent
You can use a sidecar container in one of the following ways:
- The sidecar container streams application logs to its own
stdout
. - The sidecar container runs a logging agent, which is configured to pick up logs from an application container.
Streaming sidecar container
By having your sidecar containers write to their own stdout
and stderr
streams, you can take advantage of the kubelet and the logging agent that
already run on each node. The sidecar containers read logs from a file, a socket,
or journald. Each sidecar container prints a log to its own stdout
or stderr
stream.
This approach allows you to separate several log streams from different
parts of your application, some of which can lack support
for writing to stdout
or stderr
. The logic behind redirecting logs
is minimal, so it's not a significant overhead. Additionally, because
stdout
and stderr
are handled by the kubelet, you can use built-in tools
like kubectl logs
.
For example, a pod runs a single container, and the container writes to two different log files using two different formats. Here's a configuration file for the Pod:
apiVersion: v1
kind: Pod
metadata:
name: counter
spec:
containers:
- name: count
image: busybox
args:
- /bin/sh
- -c
- >
i=0;
while true;
do
echo "$i: $(date)" >> /var/log/1.log;
echo "$(date) INFO $i" >> /var/log/2.log;
i=$((i+1));
sleep 1;
done
volumeMounts:
- name: varlog
mountPath: /var/log
volumes:
- name: varlog
emptyDir: {}
It is not recommended to write log entries with different formats to the same log
stream, even if you managed to redirect both components to the stdout
stream of
the container. Instead, you can create two sidecar containers. Each sidecar
container could tail a particular log file from a shared volume and then redirect
the logs to its own stdout
stream.
Here's a configuration file for a pod that has two sidecar containers:
apiVersion: v1
kind: Pod
metadata:
name: counter
spec:
containers:
- name: count
image: busybox
args:
- /bin/sh
- -c
- >
i=0;
while true;
do
echo "$i: $(date)" >> /var/log/1.log;
echo "$(date) INFO $i" >> /var/log/2.log;
i=$((i+1));
sleep 1;
done
volumeMounts:
- name: varlog
mountPath: /var/log
- name: count-log-1
image: busybox
args: [/bin/sh, -c, 'tail -n+1 -f /var/log/1.log']
volumeMounts:
- name: varlog
mountPath: /var/log
- name: count-log-2
image: busybox
args: [/bin/sh, -c, 'tail -n+1 -f /var/log/2.log']
volumeMounts:
- name: varlog
mountPath: /var/log
volumes:
- name: varlog
emptyDir: {}
Now when you run this pod, you can access each log stream separately by running the following commands:
kubectl logs counter count-log-1
The output is:
0: Mon Jan 1 00:00:00 UTC 2001
1: Mon Jan 1 00:00:01 UTC 2001
2: Mon Jan 1 00:00:02 UTC 2001
...
kubectl logs counter count-log-2
The output is:
Mon Jan 1 00:00:00 UTC 2001 INFO 0
Mon Jan 1 00:00:01 UTC 2001 INFO 1
Mon Jan 1 00:00:02 UTC 2001 INFO 2
...
The node-level agent installed in your cluster picks up those log streams automatically without any further configuration. If you like, you can configure the agent to parse log lines depending on the source container.
Note, that despite low CPU and memory usage (order of a couple of millicores
for cpu and order of several megabytes for memory), writing logs to a file and
then streaming them to stdout
can double disk usage. If you have
an application that writes to a single file, it's recommended to set
/dev/stdout
as the destination rather than implement the streaming sidecar
container approach.
Sidecar containers can also be used to rotate log files that cannot be
rotated by the application itself. An example of this approach is a small container running logrotate
periodically.
However, it's recommended to use stdout
and stderr
directly and leave rotation
and retention policies to the kubelet.
Sidecar container with a logging agent
If the node-level logging agent is not flexible enough for your situation, you can create a sidecar container with a separate logging agent that you have configured specifically to run with your application.
kubectl logs
because they are not controlled
by the kubelet.
Here are two configuration files that you can use to implement a sidecar container with a logging agent. The first file contains
a ConfigMap
to configure fluentd.
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluentd.conf: |
<source>
type tail
format none
path /var/log/1.log
pos_file /var/log/1.log.pos
tag count.format1
</source>
<source>
type tail
format none
path /var/log/2.log
pos_file /var/log/2.log.pos
tag count.format2
</source>
<match **>
type google_cloud
</match>
The second file describes a pod that has a sidecar container running fluentd. The pod mounts a volume where fluentd can pick up its configuration data.
apiVersion: v1
kind: Pod
metadata:
name: counter
spec:
containers:
- name: count
image: busybox
args:
- /bin/sh
- -c
- >
i=0;
while true;
do
echo "$i: $(date)" >> /var/log/1.log;
echo "$(date) INFO $i" >> /var/log/2.log;
i=$((i+1));
sleep 1;
done
volumeMounts:
- name: varlog
mountPath: /var/log
- name: count-agent
image: k8s.gcr.io/fluentd-gcp:1.30
env:
- name: FLUENTD_ARGS
value: -c /etc/fluentd-config/fluentd.conf
volumeMounts:
- name: varlog
mountPath: /var/log
- name: config-volume
mountPath: /etc/fluentd-config
volumes:
- name: varlog
emptyDir: {}
- name: config-volume
configMap:
name: fluentd-config
In the sample configurations, you can replace fluentd with any logging agent, reading from any source inside an application container.
Exposing logs directly from the application
Cluster-logging that exposes or pushes logs directly from every application is outside the scope of Kubernetes.