Containerized Cloud Logging
Vinod Gupta
Sr Technologist | Azure, MicroServices, Event Sourcing & CQRS | .NET Core, C#, Java
1. Overview
As with any service, logging is a core component of Docker. Analyzing logs provides insight into the performance, stability, and reliability of containers and the Docker service itself. However, because of the flexible and dynamic nature of Docker, there’s no single approach to gathering and storing log events. Instead, we have a variety of solutions at our disposal, each with its own benefits and drawbacks
2. Questions
- How can we log and monitor Docker effectively? This includes logging the Docker runtime infrastructure, the container itself and what goes on inside of it, and how to ensure to collect log data from ephemeral containers.
- How can we use feedback from containers to manage and improve the quality of our services?
- Can we build from decades of experience logging monolithic applications, or do we have to start from scratch?
- If we must start from scratch, how can we build a solution that helps us make better decisions?
3. Traditional vs Centralized Logging
With traditional logging methods, we would choose from a variety of logging frameworks and then define a logging strategy that logs each container (or service) independently of other containers and of the overall logging strategy
Alternatively, we can configure our containers to forward their logs directly to a central logging service. Each container still needs a way to generate logs, but the logging service is responsible for processing, storing, or sending logs to a centralized logging service such as Loggly
4. Key Considerations When Logging in Docker
Although there are some similarities, container-based logging is still very different from traditional application-based logging. Below are a few things needed to keep in mind.
4.1 Containers Are Transient
Containers come and go. They start, they stop, they’re destroyed, and they’re rebuilt on a regular basis. Storing persistent application data inside of a container is an anti-pattern with Docker, since the data will be lost once the process completes. While containers can store persistent data using volumes, the recommended solution is to export data (logs or otherwise) to a service that can store it long-term, whether it’s a folder on the local hard drive or an Azure File Storage, Azure Blob Storage or Amazon S3 bucket. This way, you can stop and start your containers without compromising your data
4.2 Containers Are Multi-Tiered
Docker logging isn’t as simple as configuring a framework and running the container. Even the simplest Docker installation has at least three distinct levels of logging: the Docker container, the Docker service, and the host operating system (OS). As the infrastructure becomes more complex and more containers are deployed, a way needed for associating log events with specific processes rather than just their host containers. We need to define custom tags for each log event as it passes through the container and later correlate those events to lookup specific log
4.3 Containers Are Complex
Docker is robust enough for many enterprises, but there are lingering security issues that have yet to be resolved. Compared to virtual machines, containers pose a much larger attack vector since they share the same kernel as the host. Some enterprises have worked around this by running Docker containers in a virtual environment while, some have taken the opposite approach by running virtual machines inside of Docker containers. Known as VM containers, these containers run just like normal containers except they host a complete Kernel-based virtual machine (or KVM) environment. This merges the flexibility of Docker containers with the security of virtual machines
Unfortunately, both approaches come with drastic increases in logging complexity. Not only do we have to log the application, the Docker daemon, and host OS, but we also must log the virtual machine and hypervisor. Logging, tagging, and associating all these services is not just a feat of architectural engineering, but requires a comprehensive solution development. It is important, though, to have the right logging strategy in place for the taken respective approach. Missing out on the opportunity to collect and aggregate the logs of one specific tier might prevent us from efficiently troubleshooting issues
5. Methods of Logging in Docker
Like virtualization, containers add an extra layer between an application and the host OS. Logging Docker effectively means not only logging the application and the host OS, but also the Docker service
5.1 Logging via the Application
This process is likely what most developers are familiar with. In this process, the application running inside the container handles its own logging using a logging framework. The application format and send logs to a remote destination. This acts as an easy and intuitive migration path for enterprises using a logging framework in their existing applications. The logs are sent from the application to a remote centralized server bypassing Docker and the OS. This gives developers the most control but also adds additional load on the application process
5.2 Logging via Data Volumes
When dealing with Docker logs, there is one important caveat we must keep in mind always. Because containers are stateless by nature, any files created within the container will be lost if the container shuts down. Instead, containers must either forward log events to a centralized logging service (such as Loggly or store log events in a data volume).
With a data volume, we can store long-term data in our containers by mapping a directory in the container to a directory on the host machine. We can also share a single volume across multiple containers to centralize logging across multiple services. However, data volumes make it difficult to move these containers to different hosts without potentially losing log data.
5.2.1 When Should I Log via Data Volumes?
Data volumes are effective for centralizing and storing logs over an extended period. Because they link to a directory on the host machine, data volumes significantly reduce the chances of data loss due to a failed container. Because the data is now available to the host machine, we can make copies, perform backups, or even access the logs from other containers
5.3 Logging via the Docker Logging Driver
a. One option is to forward log events from each container to the Docker service, which then sends the events to a syslog instance running on the host.
Note: With Loggly in place, we accomplish this by changing the Docker logging driver to log to syslog and then use the Configure-Syslog script to forward the events to Loggly.
b. Another option is to have the application forward its logs to a container dedicated solely to logging. That container, rather than the host OS, becomes responsible for forwarding each event to the right destination.
5.3.1 When Should I Log via the Docker Logging Driver?
Unlike data volumes, the Docker logging driver reads log events directly from the container’s stdout and stderr output. This lets us quickly and effectively centralize our container logs by using just the Docker service. The benefit is that our containers will no longer need to write to and read from log files, resulting in a performance gain. Additionally, since log events are stored in the host machine’s syslog, they can be easily routed to Loggly
5.4 Logging via a Dedicated Logging Container
While the two previous methods have several advantages, they share a common disadvantage: They rely on a service running on the host machine. Dedicated logging containers, on the other hand, let you manage logging from within the Docker environment. Dedicated logging containers can retrieve log events from other containers, aggregate them, then store or forward the events to a third-party service. This approach is more aligned with the microservices architecture since it eliminates your containers’ dependencies on the host machine without hindering your logging capabilities.
Dedicated logging containers can manage logs for specific containers, or they can act as a “log vacuum†for multiple containers. For example
a. Option one - Logspout container automatically captures stdout output from any containers running on the same host and forwards them to a remote syslog service.
b. Option two - we can use Joseph Feeney’sLogspout-Loggly container to send events from other containers directly to Loggly
5.4.1 When Should I Use a Dedicated Logging Container?
In addition to centralizing and aggregating logs, dedicated logging containers eliminate any dependencies on the host machine. Not only does this make it easier to move containers between hosts, but it lets us scale our logging infrastructure as needed by adding additional containers. Dedicated logging containers can retrieve logs through multiple streams (data volumes, stdout, etc.), making them at least as flexible as host-based logging solutions.
5.5 Logging via the Sidecar Approach
In sidecar, each container is linked with its own logging container. The first (or application) container saves its logs to a volume that can be accessed by the logging container. The second (or logging) container then uses file monitoring to tag and forward each event to Logging Service. An example of this approach is the Loggly Docker container. Although like dedicated logging containers, sidecar containers can offer greater transparency into the origin of log events
5.5.1 When Should I Use the Sidecar Approach?
As with dedicated logging, the key benefit of the sidecar approach is that it lets you manage logging the same way you manage your applications. Sidecar containers scale more easily than other logging methods, making them ideal for larger deployments. This approach also lets you incorporate additional tracking information specific to the logging container into each log event. By providing custom tags, we can more easily track where log events originate and which containers are actively generating logs.
The downside to this approach is that it can be complex and more difficult to set up. Both containers must work in tandem or you may end up with incomplete or missing log data. In this case, it might be easier to use a tool such as Docker Compose to manage both containers as a single unit
6. Log Contents
The Docker daemon logs two types of events:
Commands sent to the daemon through Docker’s Remote API
Events that occur as part of the daemon’s normal operation
6.1 Remote API Events
The Remote API lets you interact with the daemon using common commands. Commands passed to the Remote API are automatically logged along with any warning or error messages resulting from those commands. Each event contains:
- The current timestamp
- The log level (Info, Warning, Error, etc.)
- The request type (GET, PUT, POST, etc.)
- The Remote API version
- The endpoint (containers, images, data volumes, etc.)
- Details about the request, including the return type
6.2 Daemon Events
Daemon events are messages regarding the state of the Docker service itself. Each event displays:
- The current timestamp
- The log level
- Details about the event
- Actions performed during the initialization process
- Features provided by the host kernel
- The status of commands sent to containers
- The overall state of the Docker service
- The state of active containers
7. Proposed Design
One of the two options can be implemented
7.1 Live Logging
In this option the service writes log directly to the Azure file storage using asynchronous pattern.
7.2 Differed Logging
In this option the service writes local log inside the container and another service (Log Service) monitors completion of the log file and uploads to the Azure file storage
NOTE: If we want to use Loggly then Logging service can be replaced by Loggly else if we want to use Azure Log Analytics then Logging Service will be replaced by Azure Log Analytics service. A proof of concept is required for both approaches to arrive at the appropriate solution.
8. References:
1. Andre Newman's article https://www.loggly.com/blog/top-5-docker-logging-methods-to-fit-your-container-deployment-strategy/
2. https://docs.docker.com/storage/volumes/
3. https://www.loggly.com/docs/about-loggly/
4. https://github.com/iamatypeofwalrus/logspout-loggly
5. https://www.loggly.com/blog/how-to-implement-logging-in-docker-with-a-sidecar-approach/
6. https://www.loggly.com/blog/what-does-the-docker-daemon-log-contain/
7. https://github.com/iamatypeofwalrus/logspout-loggly
8. https://docs.docker.com/v1.10/engine/userguide/containers/dockervolumes/
9. https://dzone.com/articles/containers-5-docker-logging-best-practices
10. https://www.monitis.com/blog/containers-5-docker-logging-best-practices/
11. https://kubernetes.io/docs/concepts/cluster-administration/logging/
12. https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-overview
Technical Architect
6 å¹´Very insightful information about docker logging.